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Abstract 

Enhanced reality visualization isthe process of enhancing an image by adding 
to it information which is not present in the original image. A wide variety of 
information can be added to an image ranging from hidden lines or surfaces to 
textual or iconic data about a particular part of the image. Enhanced reality 
visualization is particularly well suited to neurosurgery. By rendering brain 
structures which are not visible, atthecorrect location in an image of a patient's 
head, the surgeon is essentially provided with X-ray vision. H e can visual izethe 
spatial relationship between brain structures before he performs a craniotomy 
and during the surgery he can see what's under the next layer before he cuts 
through. Given a video image of the patient and a three dimensional model of 
the patient's brain, the problem enhanced reality visualization faces is to render 
the model from the correct viewpoint and overlay it on the original image. The 
relationshi p between the coordinateframes of the patient, the patient's internal 
anatomy scans and the i mage pi ane of the camera observi ng the pati ent must be 
established. This problem is closely related to the camera calibration problem. 
This report presents a new approach to finding this relationship and develops a 
system for performing enhanced reality visualization in a surgical environment. 
Immediatelypriortosurgeryafewcircularfiducialsareplacednearthesurgical 
site. An initial registration of video and internal data is performed using a 
laser scanner. Following this, our method is fully automatic, runs in nearly 
real-time, is accurate to with in a pixel, allows both patient and camera motion, 
automatically corrects for changes to the internal camera parameters (focal 
length, focus, aperture, etc.) and requires only a single image. 
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Chapter 1 
Introduction 

1.1 Computers in Medicine 

The use of computers in medicine has increased dramatically over the last 
decade [Bemmel et al., 1985, Reggia and Tuhrim, 1985, Miller, 1990, Chang, 
1993]. As a result, nearly all aspects of medical care have seen improvement 
from the introduction of computer-based tools. These tools range from auto- 
mated patient record keeping to three dimensional visualization of internal 
anatomy and from computer assisted diagnosis to automatic drug interaction 
and allergy screening. The use of computers to assist physicians in the plan- 
ning and execution of surgical procedures i sal so growing [Lemoineetal., 1991, 
Smith et al., 1991, Pieper et al., 1992, Verbeeck et al., 1993]. One area which 
could benefit greatly from more sophisticated computer-based tools is neuro- 
surgery. The need to minimize collateral damage while removing diseased 
tissue requires extreme precision. In addition, damageto certain critical brain 
regions, such as the motor strip, must be avoided if at all possible. Planning 
a surgical approach meeting all of the criteria is difficult and tedious. Identi- 
fication of specific brain structures and modification of the planned approach 
are often difficult during the surgical procedure, placing additional emphasis 
on planning. Traditionally, precision neurosurgery requires the use of a stereo- 
tactic frame which is rigidly attached to the patient's skull. Figure 1-1 shows 
some typical stereotactic frames. The frame is attached prior to and is visible 
in presurgical imaging such as magnetic resonance (MR) or computed tomog- 
raphy (CT) imaging. This allows surgical plans based on presurgical internal 
anatomy scans to be transferred to the patient using the stereotactic frame 
as a reference. Frequently the patient must wear the frame for several days 
between imaging and surgery. The frames are a significant discomfort to the 
patient and are cumbersome to the surgeon, possibly limiting his flexibility 
during the procedure. 

A system which improves surgical precision, enables identification of brain 
structures, allows modification of the planned approach during the surgical 
procedure and does not require the use of a stereotactic frame would be a vast 
improvement over traditional neurosurgical procedures. 

13 
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Figure 1-1: Some typical stereotactic frames. 



1.2. WHAT IS ENHANCED REALITY VISUALIZATION? 
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1.2 What is Enhanced Reality Visualization? 

Computer assisted surgery is a relatively new development which attempts 
to provide the surgeon with a tool to assist in the planning and execution of 
surgical procedures [Adams et al., 1990, Lavalleeand Cinquin, 1990]. Image 
guided surgery is a specific type of computer assisted surgery which uses ad- 
vanced three dimensional visualization techniques to providethe surgeon with 
a wealth of valuable information not normally available in the operating room 
[Pel izzari et al ., 1991, Wei Is et al ., 1993, Black et al ., 1993, Gri mson et al ., 1994 ]. 
In essence, a complex surgical procedure can be navigated visually with great 
precision by overlaying on an image of the patient a color coded preoperative 
plan specifying details such as the locations of incisions, areas to be avoided 
and the diseased tissue. The process of aligning the preoperative plans with 
and overlaying them on the patient is known as enhanced reality visualization. 
Enhanced reality visualization isthe process of enhancing an image by adding 
information to it. The information added can be anything from text to icons or 
color coding to three dimensional surfaces. Figure 1-2 shows an enhanced re- 
ality visualization of a patient a bout to undergo neurosur gery The area shown 
in ^ is the tumor to be removed and the ventricles are shown inH. 




iL, 




J 



Figure 1-2: Enhanced reality visualization showing a tumor and ventricles. 
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1.3 A Scenario Which Utilizes Enhanced Reality 
Visualization 

1. A patient needing neurosurgery is scanned by one or more three dimen- 
sional, high resolution, internal anatomy scanners, such as magnetic res- 
onance (M R) or computed tomography (CT). 

2. Each internal anatomy scan is segmented by tissue type (white matter, 
gray matter, bone, etc). Thevariousscansofthepatientarealsoregistered 
with one another. The result is a three dimensional model of the patient's 
brain. 

3. Tools which identify, classify and label brain structures such as motor 
strip and speech centers are used to add the required detail to the model 
of the patient's brain. 

4. Once the model of the patient's brain has been constructed, enhanced re- 
ality visualization is possible. Enhanced reality visualization can beused 
to help plan the surgical procedure. Live video of the patient is mixed 
with information generated from the brain model allowing the surgeon to 
view the internal anatomy of the patient in a non-invasive manner. This 
capability allows the surgeon totest thefeasi bility of various possiblesur- 
gical approaches on theactual patient. Theenhanced reality visualization 
may be presented to the surgeon using one of several methods, such as a 
head-mounted display, a transparent projection screen or a surgical mi- 
croscope. Details regarding the surgical approach and procedure can be 
added to the brai n model . 

5. The surgical procedure is performed using enhanced reality visualization. 
Enhanced reality enables the surgical site to be precisely located and 
facilitates accurate transfer of surgical plans to the patient. 



1.4 The Problem and Our Approach 

Enhanced reality visualization is an integral part of the image guided surgery 
paradigm, however compared with other aspects, I i tt I e effort has been expended 
on this area [Adams et al., 1990, Lavalleeand Cinquin, 1990, Pelizzari et al., 
1991, Wells et al., 1993, Grimson et al., 1994]. I n order to produce enhanced 
reality visualizations we must be ableto quickly and accurately align informa- 
tion such as a brain model with an image. There are several other issues which 
must be addressed before a full function enhanced reality visualization system 
can be created. Some of these challenges are listed below. 
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Display method The displays currently available for enhanced reality visu- 
alization are less than optimal. Head mounted displays are still heavy, 
awkward and have relatively low resolution. Conventional CRT's have 
better resolution but limit the applications of enhanced reality visualiza- 
tion. 

Rendering The complexity of information and the detail and realism with 
which it can be rendered while updating at a reasonable frame rate are 
limited. 

Information acquisition Acquiring information and converting it to a form 
suitable for use in enhanced reality visualization can be a difficult and 
time consuming process. 

Virtual reality faces many of the same issues as enhanced reality visual- 
ization. There already exists a significant research effort in virtual reality 
examining these problems. Whilethereare many similarities, enhanced real- 
ity differs fundamentally from virtual reality in that it is anchored in the real 
world. Enhanced reality visualization must align the enhancement with a real 
image quickly and accurately. The ability to perform this alignment quickly 
and accurately isfundamental toenhanced reality visualization and will bethe 
focus of this report. Given a video image and a three dimensional model, the 
problem is to render the model from the correct viewpoint and overlay it on the 
original image. The relationship between the coordinate frames of the world, 
the model and the image plane of the camera must be established. This prob- 
lem is closely related to the camera calibration problem. Stated more precisely 
the problem is to: 

Determinethe perspective transformation which maps model coordi- 
nates to image coordinates in "real-time", allowing the information 
from the model to be added to an i mage i n the correct I ocati on. 

An overview of the problem is shown in Figure 1-3. Wesolvefor the transforma- 
tion which maps model coordinates to image coordinates directly. An alternate 
approach solves for several transformations and then composes them into a 
single transformation from model to image coordinates. For example, a refer- 
ence coordinate system could be defined for the physical object(s). Thefirst step 
might be to find the transformation which aligns the model with the reference 
coordinate system. Next, the transformation from reference coordinates to im- 
age coordinates must be determined. Solving for intermediate transformations 
can introduce error into the solution and is computationally more expensive. 
Unless this data is needed there is no reason to break the problem into several 
pieces. 
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Enhanced Reality 
Visualization with 
H i dden L i nes Added 

Figure 1-3: Overview of this report. 



We will define several terms that will be used throughout this report. An 
enhanced reality visualization is composed of a virtual imageoverlayedonaraw 
image. A raw image is an image of the real world prior to any enhancement. 
The coordinates of the raw image are referred to as image coordinates. The 
information which will be added tothe raw imagetoproducethe enhancement is 
referred to as the model. The coordinates of the model are referred to as model 
coordinates. A virtual image is generated by rendering the model from a 
particular view point. The view point captures the relative placement and 
orientation between model and viewer. Finally, the term world coordinates 
is used to refer to an arbitrary coordinate system attached to an object visible 
in the raw image. 

Figure 1-4 shows an overview of our method in the context of neurosurgery. 
A novel formulation of the camera calibration problem allows us to quickly 
and easily obtain the perspective transformation mapping model (MR or CT) 
coordinates to image coordinates. Theperspectivetransformation is then used 
to generate the enhanced reality visualization. Our approach utilizes several 
circular fiducials placed near the surgical site. 

Thecircular fiducialsenableus to recover a scalefactor at each fiducial (the 
focal length of the camera divided by the depth of the fiducial). Given the scale 
factor as well as the image and model (MR or CT) coordinates for each fiducial, 
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Patient with 
Fiducials Attached 




Video I mage of Patient 
with Fiducials Attached 



Model of Patient's 
Brain and Fiducials 



Enhanced Reality 
Visualization of 
the Patient with 
the Brain Added 

Figure 1-4: Overview of this report showing the circular fiducials. 




Laser Scanner Alignment 




Video I mage of Patient 
with Fiducials Attached 



<S> 



Model of Patient's 
Brain Obtained from 
MR and/or CT Data 



Mode! of Patient's 
Brain and Fiducials 



Figure 1-5: Determining the model (MR) coordinates of the fiducials. 
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the problem of determining the perspective transformation which maps model 
coordinates to image coordinates reduces to three sets of linear equations. In 
general, the model coordinates of the fiducial s are not known a priori, and they 
must be calibrated. Figure 1-5 depicts the process of determining the model 
(MR or CT) coordinates of the fiducial s. During an initial calibration phase a 
laser scanner is used to register the world coordinates of thefiducials with the 
model coordi nates of the MR or CT data. Oncethe initial calibration is complete, 
our method isfully automatic and uses visual information exclusively. Some of 
the additional characteristics of our approach which make it particularly well 
suited to enhanced reality visualization are: 

1. Requires only a single image with a fewfiducials 

2. Does not require internal calibration or known focal length 

3. Accurate to within a pixel 

4. Solution is non-iterative 

The remaining chapters of this report are organized as follows: Chapter 2 
reviews current enhanced reality visualization techniques. Chapter 3 devel- 
ops a basic camera model and contains a brief discussion of current camera 
calibration techniques. Chapter 4 presents our method and an overview of its 
implementation. Chapter 5 provides the details (theory, error analysis and 
empirical results) associated with circular fiducials. Chapter 6 discusses one 
method of calibrating fiducial s. Chapter 7 shows the results of our method for 
a test object and a plastic skull. Finally, Chapter 8 presents our conclusions. 



Chapter 2 
Related Work 



Several groups of researchers have recently been investigating enhanced real- 
ity. The proposed applications for enhanced reality range from laser printer 
repair to aircraft manufacture. Current research efforts in enhanced reality 
visualization differ in many implementation details. The one thing they all 
have in common is the requirement to align a model with an image of the real 
world. In this chapter we will examine several different approaches. 



2.1 Medical Applications 

2.1.1 Aachen U ni versity of Technology 

A group at the Aachen University of Technology in Germany has developed 
a "Computer Assisted Surgery" module for use in ENT surgical procedures 
[Adams et al., 1990]. A model of the patient is produced from presurgical CT 
scans. Radiopaque markings are attached to the patient's skull prior to the 
presurgical scans for use as reference points. The system is calibrated using 
a hand-guided electro-mechanical three dimensional coordinate digitizer. The 
digitizer is used to measure the positions of several reference points. With 
correspondence between digitizer and CT points the transformation from the 
CT coordinates to digitizer coordinates can becalculatedusing3D/3Dmatching. 
Duringan operation the surgeon can usethedigitizertopointatan unidentified 
structure. Three perpendicular views of the CT data corresponding to the 
location of the digitizer are then displayed on a nearby CRT. The system must 
be recalibrated every time the patient moves with respect to the digitizer. The 
reported accuracy is better than ±lmm. 



2.1.2 TIMB 

A group at TIMB in Grenoble, France has developed a "Computer Assisted 
Medical Intervention" module [Laval lee and Cinquin, 1990]. A model of the 
patient's internal anatomy is produced from presurgical imaging. This system 

21 
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uses a surgical robot or guidance system with modes that range from passive 
to semi -autonomous. I n passive mode the robot provides visual feedback by 
overlaying the position of an instrumented probe on the presurgical scans. I n 
semi -autonomous mode some portions of the surgical procedure are performed 
by the robot under the supervision of thesurgeon. Thesystem is calibrated us- 
ing a special calibration cage made of four Plexiglas planes containing metallic 
balls. The calibration cage is placed near the patient and a pair of X-ray im- 
ages are taken. The relationship between the presurgical imaging, the X-ray 
device and the surgical robot can be established using 3D/3D matching. Again 
if the patient moves relative to the robot (or the X-ray device) the system must 
be recalibrated. Accuracy for the instrumented probe is reported as ~5mm. 
Accuracies for other modes are not reported. 

2.1.3 University of Chicago 

A group at the U niversity of Chicago has developed a method for "I nteractive 
3D Patient - 1 mage Registration" [Pelizzari et al., 1991]. The method is used to 
position patients for radiation therapy. Again, a model of the patient's internal 
anatomy is produced from presurgical imaging. The model is used to plan the 
geometry of radiation therapy beams. Because of the non-invasive nature of 
radiation therapy it is difficult to transfer the beam geometry planned using the 
model to the actual patient. A P ol h emu s3S pace tracker and localizer are used 
asa magnetic3D digitizer to measure the surface of the patient. Themodel and 
the measured surface arethen registered using 3D/3D surface fitting. Oncethe 
registration has been performed, the magnetic digitizer is used again to mark 
the patient for setup. The intersections of the three principle planes with the 
patient are traced. These marks are then used as reference for positioning 
the therapy machine. The therapy machine must be aligned manually. If 
the patient moves the entire calibration need not be reperformed, however the 
therapy machine must be realigned with the reference marks on the patient. 
Accuracies of ~ lmm and ~1° are reported. 

2.1.4 Massachusetts I nstitute of Technology 

A group at MIT's Artificial Intelligence Laboratory has developed "An Auto- 
matic Registration Method for Frameless Stereotaxy, Image Guided Surgery, 
and Enhanced Reality Visualization"[Grimson etal., 1994]. As in the previous 
work described, a model of the patient's internal anatomy is produced from 
presurgical imaging. A Technical Arts laser range scanner is used to collect a 
set of 3D data points from the patient's skin surface. The model and the laser 
data are registered using 3D/3D surface matching. A special calibration object 
is used to calibrate the laser scanner and calibrate the location of a camera on 
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the laser scanner. Once the model is registered with the laser data and the 
location of the laser scanner camera is calibrated, the model can be overlayed 
with video of the patient. The calibration must be reperformed if the patient 
or camera move and the overlay can only be generated for a single viewpoint. 
The reported accuracy of this method is~lmm. 



2.1.5 Stanford 

A group at Stanford University has developed 'Treatment Planning for a Ra- 
diosurgical System with General Kinematics" [Schweikard et al., 1994]. The 
method is used to plan and perform radiosurgery. A model of the patient's 
internal anatomy is produced from presurgical imaging. The radiosurgery is 
planned using the model. In addition, the model is used to synthesize ra- 
diographs from different view points. A total of over 400 such radiographs 
are produced. During the radiosurgery, two nearly orthogonal X-ray images 
of the patient are taken and compared with the precomputed radiographs to 
determine the patient's position and orientation with respect to the treatment 
machine. The patient's position and orientation can be determined about twice 
per second. Thetreatment machine (X-ray system, radiation source, etc.) must 
be calibrated separately using a special calibration routine. The accuracy of 
this method is not reported. 



2.1.6 University of North Carolina 

A group at the University of North Carolina has developed a method for "Merg- 
ing Virtual Objects with the Real World" [Bajura et al., 1992]. This system 
allows the user to see ultrasound imagery overlaid on a patient in near real- 
time. A six degrees of freedom (DOF) Polhemus 3Space tracker is mounted 
on the probe used to acquire ultrasound images. A second 6DOF tracker is 
attached to the head-mounted display (HMD) used to view the overlay. Images 
from a camera a I so mounted on theHMD are combined with the ultrasound im- 
ages to produce the overlay. Si nee both trackers report position and orientation 
it is a simple matter to transform between ultrasound tracker coordinates and 
HMD tracker coordinates. I n order to transform ultrasound images into the 
coordinate system of the HMD camera the relationships between ultrasound 
i mages and the ultrasound tracker and the HMD camera and the HMD tracker 
must be establ i shed. A sped al cal i brati on j i g i s used to determi ne these trans- 
formations periodically. This system allows for motion of both the user and the 
patient. The accuracy of the overlay is not reported. 
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2.2 Other Appl i cati ons 

2.2.1 Boeing 

A group at Boeing has developed "An Application of Heads-Up Display Tech- 
nology to Manual Manufacturing Processes" [Cau del I and Mizell, 1992]. The 
goal of this work is to overlay manufacturing instructions on images of the 
manufacturing process and display them to a worker using an HMD. The in- 
structions to be overlayed are derived from a CAD model. The system uses a 
6DOF Polhemus 3D Isotrack magnetic tracking system attached to the HMD 
to generate the overlay. A calibration jig is used to establish the relationship 
between the H M D and the work site. Given the relationship between the H M D 
and the work site an overlay can easily be generated. The accuracy of this 
system is not reported. 



2.2.2 Columbia University 

A group at Columbia U niversity has developed a method for "Knowledge-Based 
Augmented Reality" [Feiner et al., 1993]. The goal of this work is to overlay 
instructions for repairing a laser printer with images of the laser printer. The 
instructions are derived from a knowledge-based system. A Logitech 3D ultra- 
sonic tracking system and an Ascension Technology magnetic tracking system 
are used to determine the position and orientation of an HMD and several key 
parts of the laser printer. Using the position and orientation information from 
the tracking system an instruction overlay is generated and displayed to the 
user via the HMD. Frame rates of about 15hz are reported. Accuracy is not 
reported. 



2.3 Discussion 

As discussed in Section 1.4 the transformation which maps model coordinates 
to image coordinates can be divided into several pieces. All of the methods 
discussed above takethis approach and all of them calculatea transformation 
which registers the model with some world (reference) coordinate system. Ini- 
tially, our discussion will consider only this piece of the larger transformation. 
Current methods of registering the model with the world coordinate system 
are somewhat limited. Many of the approaches use magnetic trackers to deter- 
mine the transformation which will align the two coord in ate frames. Magnetic 
trackers have several significant shortcomings which limit their effectiveness 
for enhanced reality visualization. Magnetic trackers havea very short range, 
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typically a few feet. Accuracies are limited to~6mm and ~1.5 . 1 Perhaps the 
most significant limitation is that magnetic and metallic objects can severely 
degrade the performance of magnetic trackers. This is a significant limitation 
for surgical applications as most operating rooms are loaded with metallic and 
magnetic objects. In addition, current advances in intra-operative imaging 
have made it possible to obtain MR images of a patient's internal anatomy 
during surgery. This environment precludes the use of magnetic trackers. 

Several other types of sensors are also used to determine the transformation 
which will register the model with the world coordinate system. These sensors 
also have I imitations which makethem less than desirablefor enhanced reality 
visualization. Ultrasonic trackers haverangeand accuracy limitations similar 
to those of magnetic trackers. While they are not susceptible to magnetic 
interference they are limited to line of sight operation. Mechanical digitizers 
have good accuracy but are cumbersome and have limited range. The laser 
scanner provides very accurate position information but also has a short range 
and is limited to line of sight operation. 

All of the methods cited above use active sensors to collect the data required 
to register the model with the world coordinate system. This requires either a 
special environment (mechanical digitizers, magnetic trackers and ultrasonic 
trackers) or a cumbersome piece of equipment (laser scanner and X-ray). There 
are also many situations where active emissions are not desirable. 

The registration produced by most of the medical applications (the excep- 
ti ons arethe work bei ng done by the groups at the U ni versity of N orth Carol i na 
and Stanford University), is limited to a fixed patient and sensor configuration. 
They do not allow for patient motion. This is because the alignment between 
the patient and the world coordinate system is implicitly determined during the 
initial calibration phase and is not monitored. It is not clear that it is possible 
to extend these methods to allow for motion. [Grimson et al., 1994] claim that 
their method is extensible to cover patient motion, however it isnot clearthat it 
is practical or possibletodosousinga laser scanner. In a surgical environment, 
with all but the surgical site hidden under sterile drapes, it is doubtful that 
enough patient surface area will be visibleto the scanner to allow registration 
of the patient and model. In addition, using a laser in the operating room might 
be di stracti ng to the surgeons. 

These methods also require a high degree of operator action to register the 
model with the patient. Typically the operator must measure data points by 
hand or edit data that was semi-automatical ly collected. At least one of the 
methods requires about half an hour to produce a single registration. 

Whileall of the medical applications register a model obtained from presur- 
gical imaging to a world coordinate system, only two of the applications (Mas- 
sachusetts Institute of Technology and University of North Carolina) actually 



1 These accuracies are for a sensor located between 10 and 70cm from the source. 
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produce enhanced reality visualizations. Neither of these methods can handle 
dynamic changes in focus, zoom or aperture of the camera used to obtain the 
raw image for the enhanced reality visualization. And only the method pro- 
duced by the group at the University of North Carolina allows the surgeon to 
change the viewpoint of the enhanced reality visualization. 

None of the current approaches to enhanced reality visualization are partic- 
ularly well suited to a surgical environment. The sensors used are active and 
arefrequently cumbersome. A good solution for a surgical environment should 
allow for patient motion and should allow the surgeon to see enhanced reality 
visualizations from different view points. It should be fully automatic follow- 
ing a straight forward initial calibration. And the method should allow for 
dynamic changes in the camera parameters. We will consider optical sensors 
for a number of reasons. Small, light weight and inexpensive video cameras 
are readily avail able. Very accurate results can be achieved with thesecameras 
which can operate over long ranges. Further, we will obtain the information 
requi red to regi ster the model with the raw i mage enti rely from the raw i mage. 
This has the advantage of ensuring that registration information is available 
exactly when raw images are avail able to produce an enhanced reality visual- 
ization. Sincethegoal of enhanced reality visualization istoenhancean image, 
the raw image will likely contain a lot of information valuable to performing 
the enhancement. Almost all of the current approaches ignore the information 
contained in the raw image opting for what is essentially a closed loop solution. 

Recently [Wei I set a I., 1993] proposed a met hod of enhanced reality visualiza- 
tion using video information exclusively, however the method requires manual 
alignment of the model with the video image and in some cases requires mark- 
ers to appear in both the MR image and videoimage. An optical tracker has also 
been proposed by a group at the U ni versity of N orth Carol i na [Wang et al ., 1990, 
Wardetal., 1992, Gottschalk and Hughes, 1993,Azuma and Bi ship, 1994]. This 
method is essentially another active sensor not unlike a laser scanner. It uses 
a "sea-of-lights" consisting of nearly 1000 LED's mounted in the ceiling tiles 
of 10' by 12' room. Three cameras mounted on an HMD are aimed at the ceil- 
ing while the LED's are flashed sequentially. This "optical tracker" requires a 
special ceiling which must be calibrated and a significant amount of additional 
hardware (3 extra cameras, LED control, etc). Neither of these, proposals are 
suitable in our application for the reasons cited above. 



Chapter 3 

Camera Calibration 

3.1 Camera Model 

The pin-hole camera is frequently used to model the transformation from world 
coordinates to image plane. The pin-hole model uses the perspective projection 
model of image formation. Orthographic projection with scale or weak perspec- 
tive is used in many computer vision applications, however it is not accurate 
enough across the enti re i mage for ou r appl i cati on. F or exampl e consi der an ob- 
ject with 5cm of depth located 100cm away from the camera and 5cm off-axis, 
see Figure 3-1. Under orthographic projection points a and b are collocated, 
however in a real image (using a 25mm lens) the points are 5 pixels apart. The 
effect is significant even with the object only slightly off center. The left side 
of Figure 3-2 shows a camera centered Cartesian coordinate system. The optic 
axis of the camera is coincident with the,? axis. The image plane is parallel to 
thexy plane and located a distance/ from the origin. Even though the image 
plane is not required to be pa rail el to thexy plane, most camera models do not 
explicitly consider this possibility. Image plane pitch 9 X and tilt 9 y are usually 
reflected in pose. We will start with the assumption that 9 X = 9 y = and then 
in Chapter 4 we will show how our method implicitly models image plane pitch 
and ti 1 1. The poi nt where the i mage pi ane and the opti c axi s i ntersect i s known 
as the principal point. Under perspective projection a point P c = [x c y c z c ] 
projects to poi ntp = [x y] on the image pi ane by the foil owing equations 1 : 

x = f- (3.1) 

y = f- (3.2) 

Zc 

Unfortunately, we are not able to directly access the image plane coordinates. 
Instead we have access to an array of pixels in a frame buffer or computer 



x We represent points as row vectors rather than column vectors. This means that points 
will be premulti plied instead of postmultiplied when applying a transformation. This is the 
exact opposite of what is typically used in computer vision, however it is the notation that the 
author was first exposed to and what has stuck. 
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Figure 3-1: Effect of orthographic projection assumption. 
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Figure 3-2: Transformation from world coordinates to the image plane of an 
ideal pin-hole camera. 
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memory. I n order to understand the relationship between image plane coor- 
dinates and the array of pixels, we must examine the imaging process. The 
image plane of a CCD camera is a rectangular array of discrete light sensitive 
elements. The output of each of these elements is proportional to the amount 
of light which falls on it. The values for each of these elements are read out one 
element at a time row after row until the entire sensor array has been read. 
This analog signal is processed by a frame grabber which converts the camera's 
output to digital values and stores them in the frame buffer. The result is an 
array of digital values in the memory of the computer. The rows of the array 
correspond to the rows of the image sensor. I n general, this is not the case 
for the columns. A synchronization signal is provided between rows, however 
the frame grabber samples the signal within a row asynchronously and at an 
independent frequency which may result in a different number of columns per 
row than were present in the camera. Synchronization errors can also cause 
the rows to not lineup. This is known as pixel jitter and in extreme cases can 
cause the x and y axes to appear n on -orthogonal or skewed [Lenz and Tsai, 
1988]. Most camera models omit skew angle 6^ (the angle between thex and y 
axes minus 90°). We will start with the assumption that 6^ = Ofor simplicity 
and then in Chapter 4 we will show how our method implicitly models skew 
angle. A single element of the array in memory is commonly called a pixel 2 . 
We will refer to the row and column number of a given pixel asy' and a:' respec- 
tively. Several parameters are defined to quantify the relationship between 
the array in memory and the coordinate system of the image plane. x and yo 
are the pixel coordinates of the principal point. s x and s y are the number of 
pixels in memory per unit distance in thex and y direction of the image plane. 
These parameters along with /, Xl 9 y and 6^ are intrinsic or internal camera 
calibration parameters. The projection of poi nt P c to poi nt p' = [x' y'] in memory 
is described by the foil owing equations: 

(3.3) 
(3.4) 

The right half of Fi gure 3-2 shows an arbitrary Cartesian coordinate system 
which we will refer to as the world coordinate system. A point P w in the world 
coordinate system is transformed into the camera centered coordinate system 
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2 As noted above pixels in memory can differ in size from the underlying image sensing 
elements. Many of the measures of accuracy cited both in this work and in the literature 
should actually be made relative to the image sensing elements. This is straight forward for a 
calibrated camera. We are working with uncalibrated cameras and the relationship between 
image sensing elements and pixels in memory is frequently not known. Thus for simplicity and 
in spite of the preceding, we will express our measures in terms of pixels in memory. 
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by the following equation: 

P C = P W K + T (3.5) 

where TZ is an orthonormal rotation matrix and T is a translation vector. TZ 
and T together are commonly referred to as the extrinsic or external camera 
parameters. 

The pin-hole model assumes an ideal camera. Because of lens distortion, 
real cameras deviate from ideal. The major categories of lens distortion are: 

1. Radial distortion -the path of a light ray traveling from the object to the 
image planethrough the lens is not always a straight line. 

2. Decentering distortion - the optic axis of individual lens components are 
not always col linear. 

3. Thin prism distortion - the optic axis of the lens assembly is not always 
perpendicular to the image plane. 

When it is necessary to explicitly model lens distortion it is typically sufficient 
to model only radial distortion. Our method, developed in Chapter 4, implic- 
itly models a linear approximation to lens distortion. 3 Using a 25mm lens of 
average quality we have not found it necessary to explicitly model lens distor- 
tion. The maximum radial distortion at the extreme edge of the image for our 
configuration is just a few pixels. 

3.2 Current Camera Calibration Techniques 

Research into the camera calibration problem has a long history originating in 
the field of photogrammetry For a more complete discussion of camera calibra- 
tion techniques seetSlama, 1980, Tsai, 1987]. Camera calibration techniques 
can be divided into three different categories: 

1. Methods which recover only intrinsic parameters. These methods gen- 
erally require a special calibration object or stand to allow the internal 
parameter(s) to be measured independent of other parameters. Also, 
these methods assume that the intrinsic parameters do not change. Un- 
less extreme care is taken to ensure otherwise, it is almost certain that 
the intrinsic parameters will change perhaps by a significant amount fol- 
lowing calibration. Examples of these methods include [Brown, 1965, 
Lenz and Tsai, 1988, Maybank and Faugeras, 1992]. 



3 A discussion of this characteristic can be found in Appendix A. 
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2. Methods which recover only extrinsic parameters (also known as geomet- 
ric methods). These methods assume that the intrinsic camera param- 
eters are precisely known. This is not always possible for the reasons 
mentioned above. Examples of these methods include [Church, 1945, 
Fischler and Bolles, 1981]. 

3. Methods which recover both intrinsic and extrinsic parameters. These 
methods can be further divided into two categories: 

(a) Nonlinear optimization methods. These methods are both nonlinear 
and iterative. These methods typical lyproducethe most accurate re- 
sults but require a large number of features and a significant amount 
of time. Further they are frequently not automatic and need a good 
initial solution to ensure convergence. Finding an initial solution 
can be a difficult problem. Examples of these methods include [Faig, 
1975, Sobel, 1974]. 

(b) Linearization methods. These methods linearize the nonlinear pro- 
jection equations by introducing additional constraints. The basic 
difference between members of this category is how the problem is 
linearized. If care is not taken when the equations are linearized 
significant bias can be introduced. Many of these methods are it- 
erative and under certain conditions fail to converge. These meth- 
ods tend to be significantly faster than the nonlinear optimization 
methods. They frequently do not require an initial solution or can 
calculate one with relative ease. These methods still require a large 
number of points for good results. Examples of these methods in- 
clude [Faugeras and Toscani, 1987, Grosky and Tamburino, 1987, 
Tsai, 1987, Goshtasby, 1987, Ganapathy 1984]. 

None of the current solutions to the camera calibration problem are ide- 
ally suited to enhanced reality visualization. The linearization methods come 
closest to meeting the requirements of enhanced reality visualization, however 
there is room for improvement. 
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Chapter 4 
Our Solution 



We have developed a novel method for determining the relationship between 
model and image coordinates. Figure 4-1 provides an overview of our method. 
Two key insights lead to a significant simplification of the problem. These 
insights are: 

1. It is not necessary to separate the intrinsic and extrinsic parameters for 
enhanced reality visualization. 

2. 1 1 i s possi bl e to recover depth i nfor mati on from a si ngl e 2D i mage. 

Utilizing these insights produces a solution that is particularly well suited 
to enhanced reality visualization and meets the requirements discussed in 
Chapters 1 and 2. Our solution is most closely related to the linearization 
methods described in Chapter 3. 
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Figure4-1: Overview of this report showing the circular fiducials. 
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CHAPTER 4. OUR SOLUTION 



4.1 A Perspective Transformation is Enough 

The camera calibration problem is typically posed as follows: 
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(4.1) 



C 



M isa matrix of model points of theform [X Y Z 1],I is a matrix of image points 
of the form [x y z] resulting from the projection of M onto the image plane, X is 
an external camera calibration matrix, TZ is an orthonormal rotation matrix, T 
isa translation vector, andC is an internal camera calibration matrix. Image 
points are expressed in homogeneous coordinates to allow the perspective pro- 
jection to be captured using linear equations [Duda and Hart, 1973]. The pixel 
coordinates of an image point [x' y'] are determined by the following relation- 
ships: x' = x/z and y' = y/z. These relationships are very similar to (3.3) and 
(3.4). I n fact, x, y and z are anal ogous to the camera centered coordi nates x c , y c 
and z c . 

The ultimate goal of most camera calibration is to enable the recovery of 
metric 3D information, such as the pose (position and orientation) of an object, 
from its two dimensional image. Clearly, to recover the pose of an object it 
is necessary to separate the intrinsic and extrinsic parameters. Separating 
the parameters is difficult [Ganapathy 1984]. The problem is nonlinear and 
several of the parameters are closely coupled. In the presence of noise a single 
solution to the camera calibration problem does not exist, rather there exists 
a set of solutions. These solutions can differ significantly and yet give rise 
to nearly identical images. For example, in the presence of noise significant 
trade offs can be made between t z and /. This can result in a solution which 
looks good from one view point but where neither the intrinsic nor extrinsic 
parameters are correct. The fact that the optimal solution for one view point 
may not be the globally optimal solution is at the heart of what makes camera 
calibration hard. 
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Camera calibration is often performed as a preliminary step in many appli- 
cations. A set of camera calibration parameters is recovered and the intrinsic 
parameters are used for future images. This assumes that a globally optimal 
set of intrinsic parameters has been recovered and that the parameters are 
fixed. By using multiple images or multiple calibration planes the solution can 
be improved in a global sense. However, in the presence of noise small errors 
can lead to large errors for view points significantly different than those used 
during calibration. Further, the intrinsic camera calibration parameters are 
not fixed. They change with the focus and aperture settings. For example, the 
principle point can shift by8pixelsor more with adjustments to focus [Will son 
and Shafer, 1993]. The effective focal length / also varies with focus and aper- 
ture settings. Zoom lenses take this variability to an extreme, enabling large 
changes to /. Lens distortion also varies with changes to focus and aperture 
[Brown, 1965]. 

In enhanced reality visualization we are interested in the total transfer ma- 
ti on from model to i mage coordi nates. We do not need to separate i ntri nsi c and 
extrinsic parameters to generate an enhanced reality image. All of the param- 
eters comprising all of the intrinsic and extrinsic calibration parameters can be 
composed into a single 3x4 matrix. This insight is not new, but how we apply 
it is. The following matrix equation is equivalent to (4.1) and the combination 
of (3.3), (3.4) and (3.5). 



I = MV 
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For our purposes finding values for the elements of V is sufficient. 
A more general internal calibration matrix can be defined as follows: 
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This new definition of C has 9 degrees of freedom. These degrees of freedom 
correspond to x , y , s x , s y , f, 9^, 9 X , 9 y and 9 Z . Only 5 of these 9 degrees 
of freedom are unambiguous. s x , s y and / actually constitute 2 degrees of 
freedom. This is equivalent to saying that C is only defined up to a scale factor. 
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9 X , 9 y and Z are redundant degrees of rotational freedom which are captured 
in X. Therefore the internal calibration matrix used in (4.1) needs only slight 
modification: the addition of a skew parameter 6^. The skew parameter is 
added in the 1 st column, 2 nd row of C. The value of the skew parameter is 
equal totan^xy or change in thex coordinate based on they coordinate. With 
the addition of this parameter to C the perspective transformation (XC) matrix 
becomes: 



V 



rnsxf + rn tan 6»xy + rnXQ r 12 s y f + ri 3 j/ ^13 

ri\s x f + r 2 2 tan 6»xy + r 2 3^0 r 2 2Syf + r 2 3V0 r 2 3 

ms x f + m tan dxy + r 33 xo r 32 s y / + r 33 j/ ^33 

txSxf + ty tan 9xy + t z XQ tySyf + t z y t z 



(4.5) 



V can exactly model n on -orthogonal frame buffer axes (0xy ^ 0) and per- 
spective projection with the image plane not perpendicular to the optic axis 
{Ox and/or 9 y ^ 0). A linear approximation of radial distortion can also be 
obtained 1 . Notice that the revised definition ofC has 5 degrees of freedom and 
when combined with the 6 degrees of freedom contained in X accounts for all 
11 degrees of freedom in V. Thus V implicitly models 5 intrinsic and 6 ex- 
trinsic parameters. The modeling is implicit because the underlying physical 
parameters are never actually computed. By formulating the problem in this 
manner, we avoid the difficulties associated with decomposing the intrinsic and 
extrinsic camera parameters. We solve (4. 2) for each image we obtain. Thus we 
do not need to worry about finding a globally optimal solution, optimal for this 
view point is sufficient. Further, changes to the intrinsic camera parameters 
are inherently captured. 



4.2 Depth I nformation F rom a Si ngle 2D I mage 

Even with the simplifications made so far, the problem of solving for the per- 
spective transformation which maps model coordinates to image coordinates is 
still nonlinear. While it is possible to solve for the elements of V using a mini- 
mum of 6 point features and an iterative method, we can do better. Expanding 
(4.2) produces: 



x 1 = x/z 

pilX + p 2 \Y + P31Z + p4i 
P13X + P23Y + P33Z + P43 



1 A discussion of this characteristic can be found in Appendix A. 
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pi 2 X + P22Y + P32Z + P42 
P13X + P23Y + P33Z + P43 



(4.7) 



If we knew the value of z = pi3X+p 2 3Y+p33Z+p A 3ther\ solving for the elements 
of V would reduce to 3 sets of linear equations. Using spatial features, rather 
than point features, enables us to measure a quantity that is proportional 
to z for each feature. We call this quantity the local scale factor. The local 
scale factor is equal to the focal length of the camera divided by the depth of the 
feature {s y f/z). We use a circular fiducial as our spatial features [Landau, 1987, 
Thomas and Chan, 1989, H ussai n and Kabuka, 1990, Chaudhuri and Samanta, 
1991, Safaee-Rad et al., 1992]. The exact nature of these fiducials will be 
discussed in Chapter 5. I n essence, by using a spatial feature we are able to 
recover 2|D information from a single video image. The idea of using spatial 
features is not new, however our use of the information provided by spatial 
features is. We will modify our matrix equation slightly by multiplying /and V 
by jj. Since/ isexpressed in homogeneous coordinates, multiplying/ and/or V 

by an arbitrary constant has no effect on the solution (^yV andP represent the 
same solution). We will refer to the elements of -±jV as» 8? anddefineJ'= -^1. 
J' is a matrix of image points of the form [x* y* 1/s]. s is the local scale factor at 
the image point. The pixel coordinates of an image point [x' y'] are determined 
by x' = sx* and y' = sy*. x*, y* and 1/s are similar to the homogeneous image 
coordinates defined in (4.1). The elements of V can be solved for using the 
foil owing three sets of linear equations and as few as four spatial features. 



x* = {pnXi :+ mYi :+ P3 iZ t :+ p A1 ) (4.8) 

y* = (puX, + P22 Y l + p 32 Z l + P4 2 ) (4.9) 

1/s," = (P13X, + P 23Y l + P 33Z l + P 43) (4.10) 

Where x*, y* and 1/s, are the components of the z th image point and X it Y % and 
Zi are the components of the z th model poi nt. 

4.3 I mpl ementati on 

It should be noted that by using a spatial featurethe problem of determining the 
relationship between model coordinates and image coordinates becomes linear 
and the solution can be found using as few as four features. Also, since all of 
the calibration parameters (both intrinsic and extrinsic) are bundled intoP we 
are not required to make any assumptions about the stability of the intrinsic 
parameters. This is important in dynamicenvironmentsbecausechangestothe 
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focus, aperture and/or zoom will likely be needed during the enhanced reality 
visualization. 

Given a model and an image containing at least four fiducials with known 
model coordinates it is a relatively straightforward task to solve for V and 
generate an enhanced reality image. The basic steps are as follows: 

1. Grab an image 

2. Locate the fiducials in the image and calculate the local scalefactor 

3. Establish correspondence between fiducials in the image and fiducials in 
the model 

4. SolveforP 

5. U si ng V, project the model i nto the i mage 

6. Display the enhanced reality image 

7. Repeat 

4.3.1 I mage Acquisition 

720 x 480 pixel images are acquired using a PulnixTMC-50 color CCD camera 
or a Panasonic WV-CD50 monochrome CCD camera with either a 25mm or 
16mm lens or a CI DTE K monochrome CID camera with a 29mm lens and a 
Sun VideoPix frame grabber. Both the cameras and frame grabber are rela- 
tively inexpensive commodity items. VideoPix is only capable of grabbing ~4 
monochrome or ~1 color frame per second. This severely limits the update rate 
of the enhanced reality visualization. Furthermore, VideoPix only provides 7" 
bits of luminance information. Even though pixel values range from to 255 
the actual resolution is less than half of this range. We intend to upgrade to 
a better camera/frame grabber combination in the future, however the current 
combination is sufficient to demonstrate our method. 

4.3.2 Fiducial Location 

The location (centroid) and local scalefactor (semi-major axis of the fiducial 's 
image divided by the radius of the fiducial) are calculated using moments. 
Chapter 5 describes these calculations in detail. 

4.3.3 Correspondence 

Once an initial correspondence has been established, it should be possible to 
maintain correspondence by tracking the fiducials. The idea is that if images 
can be processed fast enough the locations of the fiducials should not change 
very much. Given the correspondence from the last image, we look for fiducials 
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in the new image within a small region around each fiducial's last location. If 
exactly one fiducial is found in that region then the correspondence for that 
fiducial is maintained. If at least some minimum number of correspondences 
(> 4) are maintained, then the fiducials have been successfully tracked. If 
correspondence is lost or no previous correspondence exists than an initial 
correspondence must be established. The initial correspondence is performed 
using a modified version of the alignment method [Huttenlocher, 1988]. The 
alignment method is modified to use scale information as well as some orien- 
tation constraints to significantly prune the search space. The three major 
constraints used are listed below: 

• Each fiducial is visible from only one side. Specifically, the dot product 
of fiducial's normal and the viewing direction must be negative or the 
fiducial is definitely not visible. 

• The local scale factor s t establishes the relative depth of the fiducials up 
to the accuracy of the measurement. 

• The local scale factor s t is used to effectively un project the image point 
[x' % y[]. Recall that x\ = x*s t and y\ = y*s t . \fC is close to a diagonal matrix 
or if a reasonable guess exists for at least some of the intrinsic camera 
parameters, 2 then x* and y* can be treated as the x and y components 
of the camera centered coordinates for the z th point. Since scaling is not 
allowed between world coordinates and camera centered coordinates any 
transformation between the two must have a scalefactor close to unity. 

The alignment method uses triples of model and image points to generate 
possible transformations from model to image coordinates. There are a total 
of ( 3 )( 3 )3! different sets of triples where m is the number of model points 
and i is the number of image points. In general the alignment method pro- 
duces 2 solutions for every set of model and image points. This is because the 
alignment method assumes orthographic projection and is therefore unable to 
resolve reflections about the xy plane. For an image and a model both con- 
taining 7 points the alignment method generates & 15,000 possible solutions. 
Utilizing the constraints listed above significantly reduces this number. For 20 
random views of an object with 7 fiducials, the number of possible solutions was 
reduced from 15,000 to an average of 100. For one of the views the constraints 
reduced the number of possible solutions to 3. Some of these views are shown 
in Chapter 7. Theconstraintsareapplied using only theset of three imageand 
model points and before the remaining model points are transformed or global 
constraints are checked. This reduces the computational cost of establishing 
correspondence for an object with 7 fiducials by over 2 orders of magnitude. 



2 ln our experience only xq and j/o need to be estimated and it is sufficient to use the geometric 
center of the image plane as a fixed estimate of their values. 
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Pseudocode for establishing correspondence follows: 

1. If correspondences > minimum => done. 

2. If < correspondences < minimum => establish the required additional 
correspondences. 

3. Otherwises establish correspondences. 

(a) Find candidate transformations from model to image points. 

i. Take each possible triple of model points and pair it with each 

permutation of three image points, 
ii. For each group of model and image points check that the model 

points are consistent and determine from which side they are 

visible. P % and n % are the location and normal of the z th model 

point in the triple. 

• Find the normal tothetriple 

n p = (P 2 - Pi) x (P 3 - Pi) 

• Verify that the model points can be visible simultaneously 

SIGN(n p X ni) = SIGN(n p X n 2 ) = SIGN(n p X ^3) 

iii. Transform (rotate and translate) the model points of the triple 
so that thefirst point is at the origin and the points lie in thexy 
plane. Therearetwotransformations which will accomplish this, 
choose the one which will makethe model points right side up as 
determined in Step 3(a)ii. Call the transformed model points M*. 

iv. Unproject the image points of the triple and translate so that the 
first point is at the origin. Call the transformed image points 7*. 
v. Calculatethetransformation(s) ^ which maps M* to 7* using the 
alignment method. 

vi. Check that the solution computed in Step 3(a)v is consistent. 

• Use relative depth constraints to eliminate one of the two 
solutions. 

• Verify that the solution does not turn the model points upside 
down. Step 3(a)ii ensured that the model points were right 
side up. As long as the transformation calculated in Step 
3(a)v does not rotate more than 90° about any axis in the xy 
planethe model points will remain right sideup. i*istheunit 
vector in the,? direction. 

SIGN(||OT)>0 
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• Verify that the scale factor is close to unity 

vii. If the transformation computed in Step 3(a)v is consistent, com- 
pose the total transformation using the transformations from 
Steps 3(a)iii and 3(a)v and the translation from Step 3(a)iv. 

(b) For each consistent transformation determine correspondences be- 
tween the remaining image and model points and calculate the cost. 

i. Transform the remaining model points into camera centered co- 
ordinates. 

ii. For each transformed model point 

A. Find the closest image point 

B. If the closest image point is close enough, add the correspon- 
dence to the match and add the Euclidean distance squared 
to the total cost. 

iii. Return a match consisting of a list of correspondences and the 
total cost. 

(c) Consolidate matches and find the best one. 

i. For each unique list of correspondences create a consolidated 
match consisting of: 

• The list of correspondences. 

• The number of matches containing the list of 
correspondences. 

• The minimum cost among the matches containing the list of 
correspondences. 

ii. Return the best consolidated match which is the one with the 
largest number of correspondences or the largest number of 
matches or the lowest cost, in that order. 

Thealgorithmused in Step 2 to establish partial correspondences is very similar 
to that described in Step 3. If less then three correspondences exist, additional 
pai rs of model and i mage poi nts are combi ned with the exi sti ng correspondences 
to form sets of three model and three image points. If three or more correspon- 
dences exist then triples of the established correspondences are used to form 
thesets. The rest of the algorithm isunchanged. The ability to perform partial 
correspondence greatl y si mpl ifies the probl em of reestabl i shi ng correspondence 
when most but not all of the fiducial s are temporarily occluded. If at least the 
minimum number of correspondences are maintained partial correspondence 
is not used to find correspondences for fiducial s that were occluded but are now 
visible. This case is handled nicely as part of locating the fiducial s described in 
Chapter 5. 
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4.3.4 Solving for V 

Equations (4.8), (4.9) and (4.10) are used to solve for the elements of V. If 
correspondences have been established for more than four fiducial s than an 
over-determined system of linear equations exists. We solve this problem in 
a least-squares fashion by minimizing the foil owing error terms using House- 
holder's QR decomposition [Watkins, 1991]. 

n 
lkl|| 2 = EK-(Pll^'+^2lK+P31^+P4l)| 2 (4.1D 

n 
INIz = El^-(P12^'+P22l1+P32^+P42)| 2 (4.12) 



NI2 = Y, 

i=\ 



1 

{pttXi + p23Y t + P33Z t + J»43) 

5,: 



(4.13) 



Householder's method exhibits good stability and is relatively efficient. Com- 
puting the QR decomposition requires approximately nm 2 - m 3 /3 flops and 
using the decomposition to solve for the unknowns requires approximately 
2nm - m 2 /2 flops where n is the number of unknowns and m is the number of 
equations. Splitting the problem into three sets of equations has a significant 
computational advantage. Equations (4.8), (4.9) and (4.10) can be rewritten in 
matrix form as follows: 

X* = MV\ (4.14) 

y* = MV 2 (4.15) 

S = MV 3 (4.16) 

X*, F* and S are column vectors whose z th components arex*, y\ and l/s t respec- 
tively. M is a m x 4 matrix whose z th row isthez th model point [X % Y % Z % 1]. V % is 
the z th column of V. Matrix M is decomposed into the matrices Q and R. Since 
M is common to all three equations we need only perform the decomposition 
once. We can simply reuse the decomposition for the remaining equations. In 
essence, you pay for solving (4.14) and you get the solutions to (4.15) and (4.16) 
for very little. Formulating the problem as three sets of linear equations each 
with 4unknowns reduces the complexity of computing the decomposition by an 
order of magnitude and solving for the unknowns by half an order of magni- 
tude compared to solving a single set of linear equations with 12 unknowns. In 
fact the OR decomposition need not be recomputed until the correspondences 
change, further increasing the computational savings. 

4.3.5 Creating the Enhanced Reality Image 

Creating the enhanced reality image consists of two steps: making a virtual 
image (rendering the model) and combining the virtual image and the raw 
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image. V is used to project the model into an initially empty virtual image. 
Currently models consist of either a collection of points or a collection of line 
segments. Poi nts are projected by multiplying by V and then rounding off tothe 
nearest pixel. Line segments are handled by using V to project theendpoints 
into the virtual imageandthen drawing a line between them. Noanti-aliasing 
or z-buffering is performed. As a result, some edges are jagged and some 
points which perhaps should not be visible are. Once the virtual image has 
been generated it must be combi ned with the raw i mage to form the enhanced 
reality image. Pixels in the virtual image have precedence over pixels in the 
raw image. If a pixel in the virtual image is nonzero (zero being no information) 
then its value is placed in the corresponding location in the enhanced reality 
image. If a pixel in the virtual image is zero then the corresponding pixel in 
the raw image is used. Both the rendering and combination routines are a 
bit simplistic, but are sufficient to demonstrate our method. Rendering and 
combination are important problems, however they are not the focus of this 
work. 

4.3.6 Displaying the Enhanced Reality Image 

The enhanced reality image is displayed as a single image on a high resolution 
CRT using the X window system. The size of the image to be displayed and 
whether it is to be displayed on the local machine greatly affects the time it 
takes to display an image. I n the current implementation, about 20% of the 
computation time is spent simply putting a half sized enhanced reality image 
on the screen. Creating a stereo display or using a video see-through HMD are 
straightforward extensions of our method. 

4.3.7 Discussion 

The current system is implemented in Lucid Common Lisp and runs on a 
SparcStation 2. Currently, the two most limiting components are the frame 
grabber and the rendering/display system. Depending on the complexity of the 
model, the renderer may require a minute or more to perform the rendering. 
Using a simple model, frame rates of ~2hz can be achieved. Table 4.1 shows a 
break down of the computational timerequiredforthemajorfunctions. Low end 
SGI machines such as the Indy are capable of grabbing a full size color image 
anddisplayingit on the screen at > 30hz. TheSGI machinesarealsocapableof 
rendering a fairly complex model and displaying it at > 30hz. Ifthesetimesare 
substituted for frame grabbi ng, render i ng and di spl ayi ng a frame rate of ~ lOhz 
results. The frame rate would be ~20hz if the time requirements for frame 
grabbing, rendering and displaying could be eliminated entirely. Little effort 
has been put into optimizing the current implementation for speed. Receding 



44 



CHAPTER 4. OUR SOLUTION 



Function 


Time Required 


Grab Frame 


0.25s 


50.3% 


Find Fiducial sand Calculate 
Centroid and Local Scale Factor 


0.03s 


6.1% 


Check Correspondences 


0.007s 


1.4% 


Calculate Perspective 
Transformation 


0.01s 


2.0% 


Render Model and Create 
E nhanced Real ity 1 mage 


0.1s 


20.1% 


Display Image 


0.1s 


20.1% 



Table 4.1: Time required for major functions running on a SparcStation 2 in 
Common Lisp using a simple model and displaying a half sized enhanced reality 
i mage. 



some portions and using a C-30 digital signal processing board to grab and 
process images should produce significant improvements in speed. We believe 
frame rates of >30hz should be achievable with this modest hardware. 

Assuming that we are given a model to be used in creating the enhanced 
reality image is not unreasonable. In fact, it is a fundamental assumption of 
enhanced real ity visualization. The basic idea is to add information to an image. 
This information in most cases is not visible from the current view point and 
must come from some source other than the raw image (typically the model). 
This is not to say that constructing a model is easy. Model building is simply 
not the focus of this work. Assuming that we know the model coordinates of 
the fiducials in some cases is unreasonable. This amounts to assuming that 
the fiducial s are part of the model. In Chapter 7 we present enhanced reality 
visualizations of a test object and a plastic skull. The model for the test object 
is a CAD-likemodel and the fiducials are part of it. This is not the case for the 
skull. Here, the model is a CT scan of the skull. The fiducials are not present 
in the CT data and are not part of the model. In this case, we need a method 
of determining the model coordinates of the fiducials. Chapter 6 presents the 
detai I s of determi ni ng the model coordi nates for the fiducial s used on the skul I . 



Chapter 5 

Feature Detection and 
Localization 



I n the last chapter we described the theory behind our method. The success 
of any method for enhanced reality visualization is inseparably tied to the 
accuracy of the data used to determine to transformation which maps model 
coordinates to image coordinates. In this chapter we will discuss the practical 
details of finding fiducials and the accuracy with which their position and local 
scale factor can be determined. 

5.1 Details 

The circular fiducials used in our method are detected using pattern matching 
and are localized using moment calculations. An actual size fiducial is shown 
in Figure 5-1. A chord passing near the center of the fiducial will exhibit 
transitions from light to dark, dark to light, light to dark and dark to light. 
Figure 5-2 shows a blow-up of an image of a fiducial and the intensity profile 
of a chord line. Constraints such as the steepness of the transition, the length 
of the transition and the separation between transitions eliminate nearly all 
detections which do not come from actual fiducials. By checking the rows and 
columns of an image for collocated occurrences of this transition pattern the 
presence of an fiducial can be further validated and a rough position can be 
efficiently found. The time required is linear in the size of the image. In 
addition, a bounding box {xi, x 2 , y\ and y 2 ) and an upper and lower threshold 
tapper and tiower) for each fiducial can be readily obtained from this process. 
The bounding box, with vertices \x\ y{\, \x\ y 2 ], [x 2 yi] and [x 2 y 2 ], is slightly 
larger than the smallest rectangle aligned with the axes which can contain the 
fiducial. The upper and lower thresholds are used to rescalethe pixel values. 
These val ues are needed because we use grey scale moments to find the centroi d 
and local scalefactor ofthefiducial. Thelocal scalefactor is the semi -major axis 
of the fiducial 's image divided by the radius ofthefiducial. The bounding box 
and thresholds for one fiducial are completely independent from those of the 
other fiducials. This produces an extremely robust detection and localization 
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o 





Figure 5-1: 
fiducial. 



Actual size 



Figure 5-2: Enlarged image of a fiducial with 
pixel values for a chord shown to the side. 



algorithm. For example, large gradients in average image intensity, such as 
might be caused by shadows, have little effect on detecting and localizing the 
fiducials. 

Fiducials are detected by looking for occurrences of intensity profiles such 
as the one shown in Figure 5-2. The location and the maximum and minimum 
values of these profiles are used to determine a bounding box and an upper and 
lower threshold for the fiducial. Once a fiducial has been detected moments 
are used tocalculatethecentroid and local scale factor of the fiducial. Detailed 
pseudo code used to I ocate fiduci al s fol I ows: 

1. Grab a fresh image. 

2. For each fiducial present in the last image: 

(a) Define a window centered at the last location [x' d6 y' d6 ] with dimen- 
sions equal to 2kr. & is a window size scale factor and r is equal to 
the fiduci al's local seal e factor s t in the last image times the fiduci al's 
radius in model coordinates. 1 

(b) Scan the window for fiducials. 

i. For each horizontal and vertical scan line in the region collect 
possible fiducial detects. 

A. Initialize the following parameters: h through U (location 
of transition 1-4), / min and / max (the minimum and maximum 



1 s x / y is the ratio of pixel spacing in the x and y directions (s«/sy). This quantity is used to 
correct for the fact that pixels generally are not square. The* dimension of the window must 
be scaled by l/s x /y 
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separation between transitions), ^through wo, (width of tran- 
sition 1-4), u> max (the maximum width of a transition), si 
through s 4 (slope of transition 1-4), s min (the minimum in- 
tensity change to be considered above noise), t upper and t| 0W er 
(upper and lower thresholds). 

B. Scan across or down the scan line until the change in pixel in- 
tensity < -s min . Scan back several pixels, takethe minimum 
value and if it is less than t upper store it in t upper . Update h, w\ 
and si. 

C. Continue scanning until the change in pixel intensity is no 
longer < -s min . Scan ahead several pixels, takethe maximum 
value and if it is less than t| 0wer store it in t| 0wer . Update h, wi 
and si. I f wi > ^ max go to Step 2(b)i A. 

D. Continue scanning until the change in pixel intensity is > 
s min . Scan back several pixels, takethe maximum value and 
if it is greater than t| 0wer store it in t| 0wer . Updated, w 2 ands 2 . 

E. Continue scanning until the change in pixel intensity is no 
longer > s min . Scan ahead several pixels, takethe minimum 
and if it is less than t upper store it in t upper . Updated, w 2 and 
•S2- If /min <h-h< /max or s x ^ s 2 go to Step 2(b)iA. 

F. Repeat Steps 2(b)iB and 2(b)iC except update / 3 , ^3 and s 3 . If 
W3 > ^ max or 2(/2 — /1) ?6 (h — l 2 ) or S3 96 s 2 ~ s\ go to Step 
2(b)iA. 

G. Repeat Steps 2(b)iD and 2(b)iE except update U, wo, and s 4 . If 
wq > u>max or (7 4 — h) 7^ 2(/2 — h) ~ (h — h) or s 4 96 S3 ss s 2 sa si 
gotoStep2(b)iA. 

H. Add the detection (location, size and thresholds) to a list of 

detections. 
I. GotoStep2(b)iA 

ii. Consolidate detections. Detections from adjacent scan lines are 
combined if they overlap by at least 50%. Detections from or- 
thogonal scan lines are combined if the detections intersect. All 
consolidations retain the maximum bounding box, the minimum 
Supper and the maximum t| 0wer . 

iii. Expand the bounding box as necessary to ensure the fiducial is 
fully enclosed. This is required because if the fiducial is elliptical 
in shape and is at an angletothex and y axes, the bounding box 
generated by the detections may under estimate the size of the 
fiducial. 

(c) CalculatetheO th , 1 st and 2 nd moments of inertia as well astheEuler 
number of the window using grey scale values. 
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(d) If the Euler number is zero return the the location and local scale 
factor [x' y' 1/5] for the fiducial. 

3. If there was a valid solution for the last image, predict the location and 
size of each model point for which a correspondence did not exist using 
this solution. Perform Steps 2a through 2d. 

4. If the number of correspondences maintained in Step 2 plus the number 
established in Step 3 is less than the minimum, scan the entire image for 
fiducialsand establish correspondences for any new fiducials found using 
the algorithm presented in Section 4.3.3. 

Thezeroth, first and second order moments of a region bounded by x\, x 2l y\ 
and 2/2 can easily be calculated using the following formulas: 

X2 V2 

x=x\y=y\ 

X2 V2 

m >< = Y, Y, p( x >y) x ( 5 - 2 ) 

x=x\y=y\ 
X2 V2 

x=x\y=y\ 
a?2 3/2 

m x 2 = J2 J2 p( x ,y) (V + ? x) (5.4) 

x=x\y=y\ 
X2 V2 

m xy = J2 J2 p( x >y)( x y + z xy) ( 5 - 5 ) 

x=x\y=y\ 
X2 V2 

m„2 = 



r - 

x=x\y=y\ 



E J2 p( x ^y){ x +h) (5-6) 



x and y are image coordinates and p(x,y) is a weight based on the valued, y) 
of the pixel at i mage coordi nates x, y. z x , «xy and z y arethe moments of inertia for 
an individual pixel about the center of the pixel. The weights are determined 
by the following function. 



p{x,y) 



\fv(x,y) > t U pper 

1 \fv(x,y) < tiower (5.7) 

t^-v(x,y) otherw j se 
tupper-t|ower 



The Euler number of the region can be used to verify that only a single object 
with a single hole is present. By using an upper and lower threshold we can 
ensure that noisy pixels which are not on the fiducial do not contribute to the 
moment calculations and that noisy pixels which are entirely on the fiducial 
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contribute fully. Euler numbers can also be used to verify that the thresholds 
have been properly chosen. From (5.1) through (5.6) the centroid of the region 
and the moment about the axis of greatest inertia can be calculated. 2 



X = 'Sx/y— (5.8) 

mo 

y = ^ (5.9) 

mo 

-2 



-fmax = [rn x 2 - moy J Si n 6 + 2 (s x / y raxy - moxyj COS 6 5\ n 6 + 
'x/y m y ; 

1 . / 2m 



s x/v m v 2 — mox 2 ) C05 2 9 (5.10) 



6* = -arctan 



J xy 



2 \s x / y m y 2 - m x 2/s x /y/ 

It is well known that the perspective projection of a circle is an ellipse. The 
semi -major axis of the el I ipse can be calculated using the foil owing equation: 3 



4/ mflX (5 _ n) 



\ m (l + r 2 /rl) 

For now we will assume that orthographic projection is a reasonable model for 
the area immediately surrounding a fiducial, see Figure 5-3. Later we will 
consider the error introduced by this assumption, Figure 5-4. This error is 
sometimes referred to as perspective distortion. Given this assumption, the 
centroid of the circle C c projects onto the centroid of the ellipse C e and the 
diameter d' projects onto the major axis of of the el I ipse, a'. The diameter d' is 
parallel to the image plane so it is not foreshortened. Thefollowing equations 
relate the fiducial to its projection. 

(5.12) 

(5.13) 
(5.14) 

It should be noted that s, x', y' , x* and y* are the same parameters as in (4.8) 
through (4.10). x', y' and s can be easily calculated requiring time linear in the 
number fiducials and their size. At first, the need to use s^/y in recovering s 



a' 


= 


z* 


a' 
= d* = 


z* 


X 


= 








y 


= 


sy =y 







2 The quantity shown for 7 max should actually be multiplied by an additional factor of ^/ y . 
We have omitted it for simplicity sake because it cancels with the same factor for wq in (5.11). 

3 Our fiducials have holes in them and the additional factor of (l + r?/rl) in the denominator 
corrects for this, r, istheinner radius and r is the outer radius. 
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Top View of 
Projection 




Original 
Circle 



1 Projection 

" Ce nf Circle 




Figure 5-3: Orthographic projection 
of a circle. 



F i gu re 5-4: P erspecti ve proj ecti on of 
a circle. 



would appear to a significant limitation. This is not the case because s x/y is easy 
to calibrate and it does not change [Lenz andTsai, 1988, Penna, 1991]. s x/y is 
a function of the aspect ratio of the image sensor and the ratio of camera and 
frame grabber clock frequencies. The physical properties of the image sensor 
cannot changeand modern clocks haveextremely stablefrequencies. Therefore 
it is very reasonable to calibrate s x/y once and then forget about it. s x/y could 
also be determined via self-calibration removing any burden to the user. 



5.2 E r ror Analysi s 

There are several sources of error associated with processing digital images. 
One of the more significant sources is quantization errors [Kamgar-Parsi and 
Kamgar-Parsi, 1989]. These errors are the result of taking a continuous signal 
and converting it to digital values. First, we will examine errors caused by 
the fact that pixel values are only available at discrete locations in a lattice. 
Consider a row of pixels such asthose shown in Figure 5-5. The grid represents 
the pixel lattice. Pixels have just two states, on and off, with shaded squares 
representing on pixels. Using the centroid calculation described above both 
rows have the same centroid. The maximum error in the horizontal position 
of the centroid is 0.5 pixels. In the vertical direction the maximum is also 0.5 
pixels so the maximum distance between the calculated centroid and the actual 
centroid is 1/^2. 

The maximum error can be reduced significantly by using a circular shape 
[Bose and Amir, 1990, Efrat and Gotsman, 1993]. Figure 5-7 shows a digital 
approxi mati on of a ci rcl e. The i mprovement wh i ch results from usi ng a ci rcul ar 
shape is caused by the fact that the error for a given row or column is dependent 



5.2. ERROR ANALYSIS 



51 



Figure 5-5: The effect of quanti- 
zation errors on the centroid of a 
row of pixels. 



Figure 5-6: The effect of quanti- 
zation errors on the length of a 
row of pixels. 




Figure 5-7: A digital approxima- 
tion of a circle. 



Figure 5-8: Model for grey scale 
pixel values. 
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Figure 5-9: Error in the centroid of a circular fiducial. Error is the distance 
in pixels between the actual centroid and that of a digital approximation. The 
radius is also expressed in pixels. 

upon the errors of the other rows or columns. In short, the errors tend to cancel 
out. For a circle of radius r and centroid [x y ] the maximum error in the 
calculated centroid //(r) is given by the foil owing expression. 



fi(r) = max 

xq yo dr 



\[x yo] - CE NTROI D (x , y , r, dr)\\) 



5.15) 



CE NTROI D() is a function which calculates the centroid of a digital approxi- 
mation of a circle using (5.1) through (5.3), (5.8) and (5.9). p(x,y) is replaced 
with INSIDE7Q which returns a 1 if (x - x ) 2 + (y - yo) 2 < (r + dr) 2 otherwise 
it returns 0. Figure 5-9 shows the maximum error in the centroid calculation 
fi(r). The circles arethe result of evaluating (5. 15) for 0.0 < x ,y ,dr < l.Owith 
0.01 increments. The crosses are the result of a stochastic sampling method to 
find the maximum over the same region. 1/^27 is also plotted on the axes. The 
curve fits the data well and is a good estimate of //(r). 

The radius calculations described above are subject to errors similar to 
those seen for the centroid. The maximum error in the length is 1 pixel, see 
Figure 5-6. Using a circular shape will also reduce the error in the calculated 
radius for the same reasons as above. For a circle of radius r and centroid 
[x yo] the maximum error in the calculated radius v(r) isgiven by the foil owing 
expression. 



v{r) 



max 

xq yo dr 



RADIUS(£o,j/o,r»||) 



5.16) 



RADIUSQ is a function which calculates the radius of a digital approximation 
of a circle using (5.1) through (5.6), (5.8) through (5.10) and (5.11). Again, 
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Figure 5-10: Error in the radius of a circular fiducial. Error is the difference 
in pixels between the actual radius and that of a digital approximation. The 
radius is also expressed in pixels. 



p(x,y) is replaced with INSIDE7Q which returns a 1 if (x - x ) 2 + (y - yo) 2 < 
(r + dr) 2 otherwise it returns 0. Figure 5-10 shows the maximum error in 
the radius calculation v{r). The circles are the result of evaluating (5.16) for 
0.0 < xo,yo,dr < 1.0 with 0.01 increments. The crosses are the result of a 
stochastic sampling method to find the maximum over the same region. ^2/3r 
is also plotted on the axes. 4 The curve fits the data well and is a good estimate 

Of v(r). 

The maximum error for both thecentroid and radius can be further reduced 
by using grey scale values rather than binary values [Chiorboli and Vecchi, 
1993]. Grey scale values can be modeled as the sum of a number of binary 
sub-pixels. Figure 5-8 shows a pixel with a dynamic range of 122 and a value 
of 25. This effectively increases the pixel resolution by the square root of the 
dynamic range, 



Supper - Slower + 1- This i ncrease in the resolution i ncreases the 
effective radi us of the ci rcl e. 

Another class of quantization error is caused by the fact that pixel values are 
the result of a spatial process. The value of a particular pixel is not the intensity 
at some infinitesimal point, rather it is the average intensity within the area 
of the pixel. Figure 5-8 shows a model of a grey scale pixel. If the shaded and 
unshaded portions of the pixel represents maximum and minimum intensity 



4 The factor of \f2/3 results because (5.11) assumes a elliptical shape and our digital ap- 
proximations are not truly ellipses. This error is most pronounced at small radii however some 
error will always be present. 
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^P 




Figure 5-11: A pixel partially 
covered by a larger figure. 



Figure 5-12: Effect on a circular 
figure. 



respectively, then the pixel has a value of 0.2 or 25 on a scale from to 121. If 
the shaded portion is produced by a larger rectangular figure for which we are 
calculating moments, the correct values for x and y to use in (5.2) through (5.6) 
are the x and y components of thecentroid of the shaded region. Thecentroid 
of the shaded region is not the center of the pixel, however (5.2) through (5.6) 
assume that it is, in essence treating the pixel as if it were homogeneous. The 
fact that the centroid of the region which produces a pixel's value may not be 
the center of the pixel introduces error into the moment calculations. Figure 
5-11 shows a pixel only partially covered by a larger figure, p is fraction of the 
pixel covered by the larger figure. Thecalculated and actual contribution to the 
first moment, mi calculated and mi actual , as well as their difference, Ami, are given by 
the foil owing equations: 



m\ 



calculated 



py 



(5.17) 



mi 



p y 



p 



(5.18) 



Ami 



mi 



m l* 



p(l-p)/2 



(5.19) 



A maximum Ami of 1/8 occurs when p = 1/2. Ami cannot be negative therefore 
the maximum error results when Ami = 1/8 along one side of the figure and 
Ami = along the other. We will assume that Ami = 1/8 along the half circle 
shown in Figure 5-12. 5 The circle has a radius of r and Ami is towards the 



5 This assumption overestimates the error on two counts. First A^ cannot equal 1/8 
everywhere along the hemi-circle. Second Ami assumes that the partial figure is aligned with 
the pixel grid. If it is not, the maximum error is reduced. 
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Figure 5-13: Error in the centroid 
resulting from the homogeneous as- 
sumption. Both the error and radius 
are expressed in pixels 



Figure 5-14: Error in the radius re- 
sulting from the homogeneous as- 
sumption. Both the error and radius 
are expressed in pixels. 



center of the circle producing the following expression for the contribution to 
the y component of the centroid: 



Ami Jx) 



8r 



(5.20) 



Integrating this expression from -r tor and dividing by the area results in a 
maximum error in the centroid of Ay ma x = l/16r as shown in Figure 5-13. 

An analysis of the error in the second moment is similar. The calculated 
and actual contribution to the second moment, m 2calculated and m 2actual , as well as 
their difference, Am 2 , are given by the foil owing equations: 



m 2r , 



P 2T+1/12 



(5.21) 



mx 



actual 



p y 



+ P 3 /12 



(5.22) 



Am 2 



m 2r , 



py{l- p) 



m 2 act 
2p 
"12 



l-3p + 2p : 



(5.23) 
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Figure 5-15: The ideal intensity pro Figure 5-16: The effect of bleeding on 
filefor a cross section ofa circular disk, the disk in the figure to the left. 



The maximum Am 2 and the valueof pat which it occurs arefunctions of y. We 
will assume that p max = 1/2 and Am 2max = j//4. 6 The maximum error results 
when Am 2 = y/4 around the entire circumference of the circle. Doubling Am 2 
and correcting for the orientation produces 



Am 2 (:c) 



(5.24) 



Integrating this expression from -r tor and substituting into (5.11) results in 

a maximum error in the radius of Ar max = r - 

5-14. 



2 + 8/37T as shown in Figure 



Next we will consider image formation errors. These errors include noise 
and nonlinearities in the image sensor [Dinstein et al., 1984, Healey and Kon- 
depudy, 1994]. For our purposes the most significant phenomenon is the 
smoothing of high contrast edges. We will refer to this as bleeding. Figure 
5-15 shows the ideal intensity profile for a cross section of a circular disk and 
Figure 5-16 shows the effect of bleeding on the same disk. We have shown the 
transition from maximum intensity to minimum intensity as linear. This is 
almost certainly not the case, however it makes little difference for our analy- 
sis. As long as the transition has the same shape all along the circumference 
of the circle, bleeding has no effect on the centroid. The inertia of the two 
disks shown in Figures 5-15 and 5-16, however are not the same, therefore the 
radius calculation iseffected by bleeding. In order to explore this effect we will 
consider the el I ipse produced by the foil owing fundi on 



f(x,y) 



min(max(u,0), 1) 



(5.25) 



l /a 2 - y 2 /b 2 



1 — (a — w) J a* 



(5.26) 



a and b arethe semi-major and semi-mi nor axis of the ell ipseand^ is the length 
of the transition, v was chosen because it is a good approximation to a linear 
transition and is easily integrable. The ellipse is shown in Figure 5-17. The 



6 Actually p max = 1/2 - y ± \/y 2 + 1/12. This function rapidly approaches an asymptotic 
valueof 1/2. For y = 4.2 the actual Am2 max is within 2% of 1/2. 
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Figure 5-17: Intensity profile for an ellipse. 



zeroth and second moments (about the axis of greatest inertia) for f(x,y) are 
as fol I ows. 



mo 



irb 



w 



law + la' 



(5.27) 



m 2n 



7T( 



12a 



w — 4w a + Iw a — 6a w + 3a 



(5.28) 



We will assume that the actual edge occurs at a - w/2 along the major axis. 
The calculated semi-major axis can be found by substituting m and m 2max into 
(5.11). The ratio of the actual edge location to the calculated semi-major axis 
rj(a,b,w) is a measure of the error introduced by bleeding and is shown below. 



7/(a, 6, w) 



(a — w/2) 



3(w 2 



caw 



+ 2c 



\ 2 (w^ — 4w^a + 7w 2 a 2 — Qa^w + ~ia l 



(5.29) 



Figure 5-22 shows a plot of ??(a, 6, w). The transition lengths we have encoun- 
tered aretypically less than one pixel. 

So far in our discussion of errors (with the exception of bleeding) we have 
considered circles not ellipses. The analysis extends easily to cover ellipses. 
Two effects are seen as the figure becomes an ellipse. First the effective radius 
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F i gu re 5-18: The effect of quanti- Figure 5-19: The effect of quan- 

zation errors on the centroid for tization errors on the radius for 

a rectangular figure at an angle a rectangular figure at an angle 

to the pixel lattice. to the pixel lattice. 



is the semi -mi nor axis 6. Second, additional error is introduced when the major 
axis is not aligned with the pixel lattice. Figures 5-18 and 5-19 show two 
examples of the latter effect. By modifying (5.15) slightly we obtain: 



//(r, b/a) 



max 

xq |/o dr 9 



\[x yo] - CE NTROI D (x , y , r, dr, b/a, 8)\\) 



5.30) 



b/a isthe ratioofthe minor axistothe major axis. CENTROID() and INSIDE?() 
are modified appropriately to handle ellipses at any angle relative to the x 
axi s. F i gure 5-20 shows the maxi mum error i n the centroi d cal cu I ati on //(r, b/a). 
A stochastic sampling method was used to find the maximum of (5.30) over the 
region 0.0 < x , yo, dr < 1.0, < < vr and r = 10. 1 +W*-jK 1 ~ b / a ) is a | so p | ot ted 
on the axes. The curve fits the data well and is a good estimate of //(r, b/a). By 
modifying (5.16) slightly we obtain: 



v{r, b/a) 



max 

xq yo dr 9 



RADIUS (x ,y ,r,dr,b/ a, 0)\ 



5.31) 



RADIUS() is modified appropriately to handle ellipses at any angled relative 
to the x axis. Figure 5-21 shows the maximum error in the radius calculation 
v(r,b/a). A stochastic sampling method was used to find the maxi mum of (5.31) 

over the region 0.0 < x ,y ,dr < 1.0, < < % and r 



1Q l+(V2-l)y/l-b/a 



IS 



also plotted on the axes. The curve fits the data well and is a good estimate of 

z/(r, b/a). 
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Figure 5-20: Centroid error for an el- 
liptical fiducial with a radius of 10. 
The error is expressed in pixels and 
b/a is the ratio of minor and major 
axis. 



Figure 5-21: Radius error for an ellip- 
tical fiducial with a radius of 10. The 
error is expressed in pixels and b/a is 
the ratio of minor and major axis. 



Finally we will consider the errors introduced by our assumption of ortho- 
graphic projection for a fiducial. As shown in Figure 5-4, some error is intro- 
duced in the centroid as well as the semi-major axis. Consider a plane rotated 
by an angle of </> about an axis passing through [X Y Z ] in camera centered 
coordinates which is parallel to the x axis. Let [X Y] represent a point on the 
plane and let the origin of the plane be [X Y Z ]. Points on the plane project 
on to the image plane by the foil owing relationships 



Y si n <j> + Z 

/(FCOS^ + Fq) 

Y si n <j> + Z Q 



(5.32) 
(5.33) 



Next, consider a circle in the plane and centered at the origin with a radius of 
r. We can determine the effects of perspective distortion on the centroid and 
radius calculation by evaluating foil owing continuous versions of (5.1) through 
(5.6). 
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mo = 


^ r 2_,2 

= / / ^-^SxSy 

J —r J — \J r*- — y*- 


m x = 


r f^ r2 ~ y2 'a 'a ' 

= x bx by 

J — r J — Wr 2 — y 2 


m y = 


^J r 2_ y 2 

= / / r —-ytxby 

J —r J — -yr — y^ 


m x 2 = 


= / r-—y a Sx'6y' 

J — r J — a/ r 2 — y 2 


mxy = 


^J r 2_ y 2 

= x y bx by 

J — r J — Wr — y 2 


m y 2 = 


yr ry/r 2 -y 2 

= / / r^^' <^'<V 
J— y J — \/r z — y z 



(5.34) 
(5.35) 
(5.36) 
(5.37) 
(5.38) 
(5.39) 



The error in the x component of the centroid a is the difference between the 
projection ofx and m x /m . The error in they component /3 isdefined similarly 



a = Z 2 -r 2 s^ - ^o" (5 - 40) 



/ (zlr 2 s\ n 3 </> + Y r 2 cos </>si n 2 </> - Z r 2 si n (f) - Y^Zq si n <f> + ^o^b cos </>) /y 
^^- r 2 sin 2 ^) (Z cos s^>- losing) ^o 

The equation quantifying the effect of our orthographic projection assumption 
on the semi -major axis is as follows. 

7 = !£ (5.41) 

The full version is much too messy to include here, a is the semi -major axis 
of the projection and can be solved for using (5.34) through (5.39), (5.10) and 
(5.11). Figures 5-23 through 5-27 show plots of a, (3 and 7 for typical values: 
Ro = 10cm, Z = 100cm, r = 0.5cm and / = 2000 pixels. X and Y are 



converted to polar coordinates such that Rq = ^Jx 2 + F 2 . R is fixed and 6 is 
one of the axes plotted. Although we have not found it necessary, estimates of 
the transition length and the mi nor axis length can be easily obtained from the 
imageand used to improve the calculated centroid and semi-major axis. 
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Figure 5-22: The effect of bleeding 
on the radius calculation. Error is 
expressed as the ratio of the actual 
radius and the calculated semi-major 
axis, d'/a'. The transition length and 
radius are expressed in pixels. 



Figure 5-23: Perspective error in the 
centroid parallel to the major axis. 
The error is expressed in pixels. 
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Figure 5-24: Perspective error in the Figure 5-25: Total perspective error in 
centroid perpendicular to the major the centroid. The error is expressed in 
axis. The error is expressed in pixels, pixels. 
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Figure 5-26: Perspective error in the Figure 5-27: Perspective error in the 
semi -major axis for R = 10cm. Error semi -major axis for R = 10cm. Error 
is expressed as the ratio of the calcu- is expressed as the ratio of the calcu- 
lated value and the actual radius. lated value and the actual radius. 



5.3 Experiments 



I n the absence of noise, two images of a fiducial taken from the same position 
should produce the same centroid and semi-major axis. If our calculations 
were exact, a series of images taken from positions displaced only in a direction 
perpendi cu I ar to the opti c axi s shou I d produce centroi ds that vary I i nearly with 
position. Similarly, a series of images taken from positions displaced only in 
a direction parallel to the optic axis should produce semi -major axes that vary 
linearly with the inverse of position. The degree to which real data deviate 
from these ideals is an empirical measure of the accuracy of our calculations. 

Two sets of experiments were conducted using an optical bench. The first 
set of experiments examined the centroid calculations, the second set the semi- 
major axis calculations. A rail with a precision positioner runs along one side 
of the optical bench. For the first set of experiments, a camera was mounted 
on the positioner with its optic axis perpendicular to the rail. This setup 
allows camera motion only in the x direction. Motion in the y direction is 
achieved by rotating the camera 90° in its mounting. A Klinger DCS-750 
motor controller with UE-72CC positioner was used to precisely position the 
camera. The controller/positioner combination is accurate to a few microns. 
On the optical bench roughly a meter away a fiducial was mounted so that it 
appeared near the center of the camera's field of view, see Figures 5-28 and 
5-29. E xperi ments were performed for the fol I owi ng cases: 
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fiducial 



fiducial 



positioner 
and camera 



positioner 
and camera 



Optical Bench 

Figure 5-28: Top view of experimental 
setup for centroid. 



1 



Optical Bench 

Figure 5-29: Side view of experimen- 
tal setup for centroid. 



• 10 micron and 100 micron steps 

• Motion in both the x and y directions 

• Fiducial parallel to the image planeand at a 45° angle 

• Light and dark images 

For each experiment, data was collected at 26 camera positions along the rail. 
At each position, 100 images were collected. The mean and standard devia- 
tion were calculated for each position. The results are shown in Figures 5-30 
through 5-41. The mean at each position is marked by an "x". The error bars 
are 1 standard deviation above and below the mean. The line is the least 
squares best fit to the means. Table 5.1 shows both the theoretical and empir- 
ical accuracy of the centroid calculation for three conditions. The theoretical 
and empirical results are in good agreements. 



Condition 


Dynamic 
Range 


Major 
Axis 


Empirical 
Accuracy 


Theoretical 
Accuracy 


Bright, flat 


80 


17.5 


0.065 


0.087 


Dark, flat 


6 


17.5 


0.15 


0.16 


45° Angle 


40 


17.5 


0.22 


0.16 



Table 5.1: Empirical and theoretical accuracy for centroid calculations. 
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Figure 5-30: Data for bright image, 
camera motion in tries direction with 
100 micron steps and fiducial parallel 
to the image plane. 




Camera Position (1 unit = 10 microns) 



Figure 5-31: Data for bright image, 
camera motion inthex direction with 
10 micron steps and fiducial parallel 
to the image plane. 




50 100 150 200 

Camera Position (1 unit = 10 microns) 



Camera Position (1 unit = 10 microns) 



Figure 5-32: Data for bright image, 
camera motion in they direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-33: Data for bright image, 
camera motion in they direction with 
10 micron steps and fiducial parallel 
to the image plane. 
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Camera Position (1 unit - 10 microns) 



Figure 5-34: Data for dark image, Figure 5-35: Data for dark image, 

camera motion in thex direction with camera motion in thex direction with 

100 micron steps and fiducial parallel 10 micron steps and fiducial parallel 

to the image plane. to the image plane. 
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Figure 5-36: Data for dark image, 
camera motion in they direction with 
100 micron steps and fiducial parallel 
to the image plane. 



Figure 5-37: Data for dark image, 
camera motion in they direction with 
10 micron steps and fiducial parallel 
to the image plane. 
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Figure 5-38: Data for bright image, 
camera motion in tries direction with 
100 micron steps and fiducial 45° to 
the image plane. 
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Figure 5-39: Data for bright image, 
camera motion inthex direction with 
10 micron steps and fiducial 45° to the 
image plane. 
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Camera Position (1 unit = 10 microns) 



Figure 5-40: Data for bright image, 
camera motion in they direction with 
100 micron steps and fiducial 45° to 
the image plane. 
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Camera Position (1 unit = 10 microns) 



Figure 5-41: Data for bright image, 
camera motion in they direction with 
10 micron steps and fiducial 45° to the 
image plane. 
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Optical Bench 



fiducial 



fiducial 



positioner 
and camera 



positioner 
and camera 



Optical Bench 

Figure 5-42: Top view of experimental Figure5-43: Side view of experi men- 
setup for semi -major axis. tal setup for semi -major axis. 

A similar set of experiments were conducted for the semi-major axis. For 
this set of experi ments, the camera was mounted with its optic axis parallel to 
the rail. This setup allows camera motion only in the z direction. As before 
a fiducial was mounted on the optical bench roughly a meter away so that it 
appeared near the center of the camera's field of view, see Figures 5-42 and 
5-43. E xperi ments were performed for the fol I owi ng cases: 

• Fiducial parallel to the image planeand at a 45° angle 

• Light and dark images 

For each experiment, data was collected at 26 camera positions along the rail. 
At each position, 100 images were collected. The mean and standard deviation 
were calculated for the major axis at each position. The least squares best fit to 
the means was also found. The results are shown in Figures 5-44through 5-46. 
Table 5.2 shows both the theoretical and empirical accuracy of the semi -major 
axis calculation for three conditions. The theoretical and empirical results are 
in good agreements. 



Condition 


Dynamic 
Range 


Major 
Axis 


Empirical 
Accuracy 


Theoretical 
Accuracy 


Bright, flat 


40 


17.5 


0.15 


0.16 


Dark, flat 


4 


17.5 


0.23 


0.24 


45° Angle 


50 


16.8 


0.24 


0.23 



Table 5.2: Empirical and theoretical accuracy for semi-major axis calculations. 
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Camera Position (1 unit = 10 microns) 



1000 2000 3000 4000 

Camera Position (1 unit = 10 microns) 



Figure 5-44: Data for bright image, Figure 5-45: Data for dark image, 

camera motion in the,? direction with camera motion in the,? direction with 

2000 micron steps and fiducial parallel 2000 micron steps and fiducial paral- 

tothe image plane. lei to the image plane. 




1000 2000 3000 4000 

Camera Position (1 unit = 10 microns) 



Figure 5-46: Data for bright image, 
camera motion in the,? direction with 
2000 micron steps and fiducial 45° to 
the image plane. 
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5.4 Discussion 

An accurate, efficient and robust method of locating fiducial s has been de- 
scribed. A thorough error analysis of the method has been provided including 
both theoretical and empirical data. This data confirms that in the worst case 
fiducials can be located to within 0.25 pixels and the local scale factor can be 
determined to within 1.5%. 
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Chapter 6 

Initial Calibration 



Before our fiducials can be used to perform enhanced reality visualizations, 
their model coordinates must be known. In cases were the model coordinates of 
the fiducials are not known a priori, an initial calibration must be performed. 
An initial alignment is performed using data from a laser scanner [Grimson et 
al., 1994]. This initial alignment along with an image showing the fiducials is 
then used to lookup the model (MR or CT) coordinates of the fiducials. Figure 
6-1 shows an overview of this process. Once the coordinates of the fiducials 
have been established the laser scanner is no longer needed and any camera 
which can view the fiducials can be used for enhanced reality visualization. In 
this chapter we provide a general discussion of how the laser scanner works 
and then give the details of fiducial calibration. 




Laser Scanner Alignment 




V i deo I mage of P ati ent 
with Fiducials Attached 



Model of Patient's 
Brain Obtained from 
M R and/or CT Data 




Model of Patient's 
Brain and Fiducials 



Figure 6-1: Determining the model (MR) coordinates of the fiducials. 
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u 



Object being scanned 



Object being scanned 



Figure 6-2: Side view of scanner. 



6.1 Laser Scanner 



Figure 6-3: Object and laser light 
plane from video camera perspec- 
tive. 



For the skull data shown in Chapter 7, the initial calibration was performed 
with the aid of a laser range scanner produced by Technical Arts Corporation. 
It produces a planar sheet of light which is scanned using an oscillating mirror. 
A video camera is placed at an angle to the plane of light, see Figure 6-2. 
The x axis is parallel to the line segment joining the laser and video camera. 
The laser is oriented so that the line of illumination formed when the laser's 
plane of light strikes an object is perpendicular to the x axis. The y axis is 
parallel to the line of illumination, see Figure 6-3. The z axis is orthogonal 
to both the x and y axes. The y axis in Figure 6-2 is into the page and the x 
axis in Figure 6-3 is out of the page. The three dimensional coordinates of an 
object's surface can be determined if it is placed so that it can be simultaneously 
illuminated by the laser and viewed by the video camera. They coordinateof a 
data point is determined directly from the horizontal displacement measured 
by the video camera. Thex and z coordinates are recovered using the vertical 
displacement and the scan angle of the laser beam. A single scan produces 240 
three dimensional measurements with accuracies up to 0.003". 

Surface points on the patient's head near the surgical site are measured 
with the scanner. These data points are aligned with a model of the patient's 
head and brain obtained from previous MR and/or CT scans [Grimson et al., 
1994]. The registration is produced by minimizing the sum of the squared 
distance between the laser data and the model. The model is sampled at several 
resolutions to speed convergence and random perturbations are used to avoid 
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Laser 



Video 
Camera 






Figure 6-4: Model of patient's brain Figure6-5: Patient, scanner and coor- 
with coordi nate axes. di nate axes. 




Laser 



Video 
Camera 




M 







Figure 6-6: Model of patient's brain 
aligned with patient (validonlyfor the 
scanner). 



Figure 6-7: Model of a skull. 
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Top view Side view End view 

Figure 6-8: Laser data from skull. 




Figure 6-9: Video of skull being scanned. 
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Figure 6-10: Laser data aligned 
with skull. 



Figure 6-11: Model aligned with 
vi deo of the same skul I . 



local minima. This produces a transformation from the coordinate frame of the 
model to that of the patient. Figure 6-4 shows a brain model and its coordinate 
system. Figure 6-5 shows the patient, scanner and their coordinate system. 
Figure 6-6 shows the result of aligning the model and the patient coordinate 
systems. It should be noted that Figure 6-6 is valid only from the perspective 
of the camera attached to the laser scanner. 

Figure 6-7 shows a model obtained from CT imaging of a plastic skul I. Figure 
6-8 shows the three dimensional data obtained from laser scanning the same 
skull. Figure 6-9 shows two images of the skull containing laser scan lines. 
Figure 6-10 shows the laser data aligned with the skul I. Figure 6-11 shows the 
model of the skull aligned with and superimposed upon video of the skull. The 
accuracy of this registration is bel i eved to be on the order of the resolution of 
the M R or CT data (« 1mm). 



6.2 Calibration Routine 

The transformation used to produce Figure 6-11 essentially maps model coor- 
dinates to image coordinates. This is exactly the inverse of what we need to 
calibrate our fiducial s. What we would I ike to be able to do is measure the i im- 
age coordinates of the fiducial sand then use an inverse mapping to determine 
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their model coordinates. The inverse mapping is not in general a one to one 
mapping. The i mage coordinates of a fiducial determine the ray in three space 
along which the model coordinates of the fiducial must lie. By intersecting 
this ray with the model and Z-bufferingthe result we can determine the model 
coordi nate of the fiduci al . 

The initial calibration is limited by the accuracy of the registration per- 
formed using the laser scanner. The laser scanner registration produce results 
which are both repeatable and good for the single view in question. It is not 
clear how accurate the registration is in an absolute sense. Absolute accuracy 
is important for calibrating the fiduci als. For example, small errors which are 
imperceptible from one point of view frequently lead to large errors from dif- 
ferent view points. Perturbing the laser scanner solution by less than 1° can 
change a fiduci al's location by over 6mm. The uncertain accuracy of the fiducial 
calibration bears significantly on the quality of enhanced reality visualizations 
which can be produced. In spite of this issue, the results shown in Chapter 7 
are promising. The exact source and nature of the errors in the initial calibra- 
tion needs to be explored further. The method of initial calibration presented 
here, whileit requires the use of a laser scanner, has the advantage that it can 
be made fully automatic. Of course, other methods of initial calibration could 
be used. The only requirement is that the fiducial locations be determined 
accurately. How this information is obtained does not matter. 



Chapter 7 
Results 

7.1 Test Object 

To determine the accuracy of our method we performed several experiments 
using a special test object. A three dimensional object with seven fiducial s 
was made. The relative positions of the fiducials were accurately measured. 
Figure 7-1 shows the basic test object. The large pillar near the center is used 
to measure the accuracy of the enhanced reality visualization. A wireframe 
corresponding to the edges of the pillar is displayed in the enhanced reality 
image. The difference between the actual edges and wireframe is a measure 
of the accuracy of the visualization. For comparison purposes a wireframe is 
also superimposed on a shorter pillar. Experiments were performed with four 
slightly different fiducial configurations. The first configuration consists of 6 
coplanar fiducials plus a single fiducial 1cm above the plane. The enhanced 
reality visualizations produced from this configuration are not consistently ac- 
curate. Figures 7-2 through 7-5 are typical of the results for this configuration. 
Notice that the wire frame on the smaller pillar matches in all of the images. 
Errors occur only significantly away from the volume enclosed by thefiducials. 
The second configuration moves one of the 6 coplanar fiducials so that it is also 
lcm above the plane. Figures 7-6 through 7-8 show typical results for this 
configuration. The accuracy is greatly improved with the addition of a second 
noncoplanar fiducial, however occasional slight errors do occur. The next config- 
uration adds a third noncoplanar fiducial. Typical results for this configuration 
areshown in Figures 7-9 and 7-10. With this configuration, errors rarely occur 
and when they do they are small. Thefinal configuration is similar to the first 
except that the noncoplanar fiducial is 4cm out of the plane. Results for this 
configuration areshown in Figures 7-11 and 7-12. The accuracy of this config- 
uration is comparable to that of the last. Some depth in the model is required 
to accurately recover the third dimension. Once sufficient depth is present in 
the model, it appears that the solution is accurate over a large volume. 

As shown in Figures 7-2 through 7-12, under reasonable conditions our 
method produces good results. These figures provide only a subjective measure 
of accuracy. Next we will attempt to provide a more quantitative measure. 
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F i gu re 7-1: Test obj ect. F i gu re 7-2: Test obj ect vi ew 1. 




F i gu re 7-3: Test obj ect vi ew 2. F i gu re 7-4: Test obj ect vi ew 3. 
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F i gu re 7-5: Test obj ect vi ew 4. F i gu re 7-6: Test obj ect vi ew 5. 




HE 





F i gu re 7-7: Test obj ect vi ew 6. F i gu re 7-8: Test obj ect vi ew 7. 
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F i gu re 7-9: Test obj ect vi ew 8. F i gu re 7-10: Test obj ect vi ew 9. 





Figure 7-11: Test object view 
10. 



Figure 7-12: Test object view 
11. 
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Average 
Distance 


RMS 
Distance 


Maximum 
Distance 


Median 
Distance 


Number of 
Points 


1.35 


1.46 


2.94 


1.38 


848 



Table 7.1: Distance between edge-based and enhanced reality vertices. 



Average 
Misalignment 


RMS 
Misalignment 


Maximum 
Misalignment 


Median 
Misalignment 


Number of 
Points 


0.88 


0.91 


1.77 


0.87 


212 



Table 7.2: Misalignment between edge- based and enhanced reality polygons. 



Given that the goal of enhanced reality visualization is to produce an enhanced 
image, the proper way to assess accuracy is to consider the deviation of the 
enhanced image from the ideal image. For various reasons we do not have 
access to the ideal image. However, we can compare the deviation between 
the image of a physical object and the enhancement produced from a model of 
the same physical object. For example, we could compare the wire frame to 
the edges of the test object's central pillar. This is essentially what we will do. 
Figure 7-13 shows the same test object used above with one change, the top of 
the central pillar had been darkened to provide high contrast edges. Edge pixels 
are extracted and chained together using a Canny edge detector. A line is fitted 
to the chains by minimizing the distance between the edge pixels and the line. 
These lines are used to compute two measures of accuracy. Thefirst measure is 
the distance between the vertices of the darkened region as determined by our 
method (enhanced reality visualization) and those determined by intersecting 
the lines recovered above. The second measure is the area of misalignment 
di vi ded by the peri meter or the average di stance between the two polygons, see 
Figure 7- 14. 

A sequence of 212 i mages of the test object rotati ng through 360" was used. 
The first measure was computed for each of the four vertices in each image. 
The distance between the vertices in each image are shown in Figures 7-15 
through 7-18. The second measure was computed for the darkened polygon 
in each image and the average distance between the polygons for each image 
is shown in Figure 7-19. Tables 7.1 and 7.2 summarize these results. The 
edge-based positions agree well with those produced by our method. It should 
be noted that edges and vertices recovered using the Canny edge detector are 
not without error. The most significant source of error is the fact that the 
implementation used only localizes the edge to the nearest pixel. Asa result of 
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Figure 7-13: Test object used to Figure 7-14: Misalignment of 
quantify accuracy. edge-based rectangle and en- 

hanced reality rectangle. 
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Figure 7-15: Distance in pixels be- Figure 7-16: Distance in pixels be- 
tween edge-based and enhanced real- tween edge-based and enhanced real- 
ity positions for vertex 1. ity positions for vertex 2. 
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Figure 7-17: Distance in pixels be- Figure 7-18: Distance in pixels be- 
tween edge-based and enhanced real- tween edge-based and enhanced real- 
ity positions for vertex 3. ity positions for vertex 4. 






kA 



V\J 



\J\ U t 



0(T 



T5o lio 

image 



2(i0 



Figure 7-19: Average distance be- 
tween edge-based and enhanced real- 
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this, the first measure probably over estimates the difference between the edge 
image and the enhanced image. A signed measure of the distance from point to 
line 1 was also computed to check for correlation in the errors. The average for 
the first measure was 0.09 pixels and for the second measure was 0.04 pixels 
making it unlikely that the errors in edge-based positions are correlated with 
those in the enhanced reality positions. The average difference between the 
two peri meters is likely the better measure of accuracy and using this measure 
our method is accurate to within a pixel. 

7.2 Skull 

After calibrating the fiducial s as described in Chapter 6, enhanced reality vi- 
sualizations were performed from several different view points. Figure 7-20 
shows the initial view of a plastic skull. Figure 7-21 shows the results of the 
registration using the laser scanner. The white dots are CT data points for the 
skull superimposed upon an image of the skull. Figure 7-22 shows the results 
of our method using the initial view point. Note that as expected the error in 
the Figures 7-21 and 7-22 are comparable. Figures 7-23 through 7-32 show the 
results of our method using ten novel viewpoints. The exact source of the mis- 
alignment present in some of the figures is not known. Two likely sources are 
the initial calibration and the fact that thefiducials are nearly coplanar. The 
errors are largest for view points significantly different from that used during 
the initial calibration which suggests that at least some of the misalignment 
is caused by errors introduced during the initial calibration. Also, as noted in 
Chapter 6 small perturbations in the laser scanner alignment cause relatively 
large variations in the calibrated positions of thefiducials. A morerobust initial 
calibration and a method of handling coplanar fiducials, which together should 
eliminate these errors, are currently being investigated. 



^hesign indicates which side of the line the point is on. 
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Figure 7-20: Plastic skull. 



Figure 7-21: Initial registra- 
tion using the laser scanner. 




Figure 7-22: Skull initial view. 
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Figure 7-23: Skull view 1. 



Figure 7-24: Skull view 2. 





Figure 7-25: Skull view 3. 



Figure 7-26: Skull view 4. 
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Figure 7-27: Skull view 5. 



Figure 7-28: Skull view 6. 





Figure 7-29: Skull view 7. 



Figure 7-30: Skull view 8. 
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Figure 7-31: Skull view 9. 



Figure 7-32: Skull view 10. 



Chapter 8 
Conclusions 

8.1 Future Work 

There are many improvements and extensions which can be made to the basic 
method presented in this report. Several of them have been alluded to in 
previous chapters. 

8.1.1 Auto calibration 

The current implementation requires knowl edge of the rati oof pixel spacing in 
thex and y directions, s x/y in order to recover s. While s x/y is extremely stable 
and determining its value is straight forward, it would be nice to eliminate 
this requirement. Given enough noncoplanar fiducials it should be possible to 
solve for s x/y . Solvingfor s x/y might be fairlytime consuming but it should only 
need to be performed once. Similarly, it would be nicetobeableto handle lens 
distortion (although we have not found it necessary). It is a simple matter to 
add a lens distortion model as a preprocessor. The fiducial locations are simply 
corrected by the amount specified by the model. In some cases considering dis- 
tortion would surely improve the results. Unfortunately this requires finding 
the proper distortion model. There are several well established methods for 
determining lens distortion. Ideally, the model would be determined automat- 
ically. Lens distortion changes with several of the other camera parameters 
such as aperture, focus and zoom. Even so, given several data sets consisting 
of i mage poi nts, model points, perspective transformation, aperture, focus and 
zoom settings we should be able to construct a model which takes into account 
the effect of changes to aperture, focus and zoom. It should be necessary to 
construct a new model or modify an old model only occasionally. Our method 
as it is currently implemented models a linear approximation to lens distor- 
tion implicitly. Automatically determining a valuefor s x/y and a model for lens 
distortion are two examples of auto calibration. Auto calibration would bridge 
the gap between what is implicitly modelable and other required or desired 
parameters. The parameters which are implicitly modeled are those which 
can change from one image to the next. The parameters which are candidates 
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for auto calibration are either fixed or vary on much longer time scales. This 
makes it possible to run an auto calibration routine in the background updating 
parameters as necessary but certainly not every frame. Auto calibration would 
enable a completely uncalibrated camera with a poor quality lens to be used to 
produce very accurate enhanced reality visualizations. 

8.1.2 General Features and Self Extending Models 

Where auto calibration enables camera parameters to be recovered, self ex- 
tending models allow recovery of model parameters. Given a partial model and 
several solutions from different view points it should be possibleto recover the 
three dimensional location of points not in the model but which are visible. 
This effectively extends the model. The ability to use general features in place 
of fiducials would be a significant improvement for many potential applications 
and makes self extending models truly useful. The circular fiducial currently 
used facilitates recovery of depth information. The same kind of size informa- 
tion can potentially be recovered from naturally occuring spatial features such 
as patches of texture, etc. Another possible option is to use stereo to recover 
depth information. Since relative depth is all that is required the stereo cam- 
eras need not be calibrated. Using a stereo setup would eliminate the need to 
knows x /y and would facilitate a stereo display. 

8.1.3 Miscellaneous 

Several other more modest improvements or extensions also exist. As noted in 
Chapters 4 and 7, noncoplanar points significantly improve the results of our 
method. With a slight modification to our method it should be possibleto use 
planar data exclusively. The modification involves solving a quartic equation. 
The ability to fundi on when all of the fiducials are coplanar would certainly be 
an asset. It is not clear what accuracy can be achieved by this approach. As 
noted in Chapter 6 there is room for improvement in the initial calibration of 
fiducials. At a minimum a better understanding of the source and nature of 
errors is needed. A more robust method of calibrating fiducials would also be 
very useful. Finally, the current implementation can be optimized significantly 
for speed. 



8.2 Applications 

Neurosurgery isjust one application for enhanced reality visualization. There 
are numerous other potential applications both inside and outside the medical 
field. These applications range from manufacturing and repair to navigation 
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and rescue. Enhanced reality visualization can be most readily applied in 
domains where models either already exist or are easily obtainable. In the 
medical field models are easily obtained using internal anatomy scanners such 
as MR and CT. Models already exist for many repair and manufacturing en- 
vironments. Before en ha need reality visualization will be accepted for routine 
use an accurate and robust method of performing the visualization must be 
developed. Our method makes significant progress towards this goal. Better 
methods of generating models will make it practical to apply enhanced reality 
visualization to many more situations. Because it is anchored in the real world 
it has the potential to affect our everyday lives. For example, enhanced reality 
visualization can be used totranscend the I imitations of computer monitors. By 
moving the display off the desk and into a visor or pair of goggles the display 
becomes much more useful. Multiple virtual screens can be defined. These 
virtual screens are anchored in the real world so the user, rather than fumbling 
with a mouse to expose the desired window, can simply look around and exam- 
ine the contents of the various virtual screens. In effect the size of the display 
becomes unlimited. Since the virtual screens are anchored in the real world 
they are spatially organized which is a powerful organizational metaphor. For 
example, you can define a virtual screen containing a phone list and attach it 
to your bulletin board. Whenever you need to check a phone number on the list 
all that need be done is look at the bulletin board. The phone list will always 
be where you posted it. 

8.3 Discussion 

A new method for performing enhanced reality visualization has been devel- 
oped. The method achieves good results using just a fewfiducials placed near 
the volume of interest. Noncoplanar fiducial s yield better results. Our method 
allows for motion and automatically corrects for changes to the internal cam- 
era parameters (focal length, focus, aperture, etc.) making it particularly well 
suited to enhanced reality visualization in dynamicenvironments. In asurgical 
application, we pi ace a fewfiducials placed near the surgical site immediately 
prior to surgery. An initial calibration is performed using a laser scanner. Af- 
ter the calibration is performed, our method is fully automatic, runs in nearly 
real-time and is accurate to a fraction of a pixel and requires only a single 
i mage. 
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Appendix A 

Effects of Radial Distortion 



As discussed in previous chapters, our method only models a linear approxima- 
tion to radial distortion. In this appendix we will consider the consequences of 
this approximations. Radial distortion is typically modeled as follows [Slama, 
1980]: 



X u 

y'u 

Sx 
Sy 



rc'd + Sx 

y'd + Sy 

(x' d - xq) [K\r' 6 + K 2 r' d 

(y'd - yo) (K\r'\ + A Vd 



(A.l) 
(A.2) 
(A3) 

(A.4) 



Where x' u and y' u are the undistorted pixel coordinates, x' d and y' d are the dis- 
torted pixel coordinates with r'\ = (x' d - x ) 2 + (y' 6 - y ) 2 , and K\ and K 2 are 
the radial distortion coefficients. If K\ and K 2 are known, it is quite simple 
to use this radial distortion model to correct the data before passing it to our 
method. However, what \f Ki and K 2 are not known? The linear approximation 
contained in our method works well if radial distortion is not too great. It also 
works well if the fiducials are near the location of the enhancement. 

As a preliminary step, we measured the radial distortion for our camera 
setup. Radial distortion was measured using the plumb-line method [Brown, 
1971, Stein, 1993] for a 16mm and 25mm lens. The results are summarized in 
TableA.l. Plots of the distortion arealso shown in Figures A-l and A-2. 

In order to gain some in sight intothe effect of radial distortion on our method 
we will attempt to quantify the difference between the radial distortion model 



Lens 


XQ 


2/0 


K X 


K 2 


16mm 


336 


246 


2.0e-8 


2.0e-13 


25mm 


332 


248 


-4.0e-8 


1.0e-13 



TableA.l: Radial distortion parameters for a 16mm and 25mm lens. 
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described above and the linear approximation included in our method. To quan- 
tify the difference, a set of 5 three dimensional control poi nts K were selected 
and a perspective transformation V (composed of a rigid transformation and a 
camera calibration matrix as described in Chapter 4) was defined. The control 
points were then projected onto the image plane using (4.2) producing a set of 
undistorted image poi nts I u . Theundistorted image poi nts were then distorted 
by 8d= [Sx 6y] based on the radial distortion measured above producing a set of 
distorted image poi nts id- Now we have a set of control (model) poi nts and a set 
of distorted image points. This is exactly the information that our method uses 
tocomputea perspectivetransformation. Wecomputea new perspective trans- 
formation V using (4.8) through (4.10). The effect of the linear approximation 
on an arbitrary three dimensional point Mean be expressed as fol I ows: 

v = \\(MV + Sd)-MV'\\ 2 (A.5) 

Figures A-3 and A-4 show two plots of y for a plane parallel to the image 
plane and passing through the control points. The five spi kes mark the image 
locations of the five control points. N otice that globally y is no worse than the 
raw radial distortion. Further, y is small in the vicinity of the control points 
even when the control points are located at the edge of the image. These two 
characteristics were true for all of the simulations that we ran. This confirms 
the claim that our method works well if the radial distortion is not too great or 
the enhancement is close to the fiducials. 

Finally, we performed an experiment using a I ens with significant distortion. 
FigureA-5 shows an image taken using a 4.8mm lens. The test object is the 
same one which appeared in earlier figures. Thedistortion is readily apparent. 
Figures A-6 through A-8 show the results of our method without correcting for 
distortion. 

I n general, a more accurate model produces more accurate results. Correct- 
ing for radial distortion undoubtedly would improvethe results of our method. 
However, as shown in this appendix for enhanced reality visualization using 
reasonable cameras in realistic viewing situations the improvement is almost 
insignificant. 
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FigureA-1: Radial distortion in pixels FigureA-2: Radial distortion in pixels 
for a 16mm lens. for a 25mm lens. 





FigureA-3: Effect of radial distortion FigureA-4: Effect of radial distortion 
on our method with thefiducials near on our method with thefiducials near 
the center of the i mage. the edge of the i mage. 
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FigureA-5: I mage with signifi- 
cant distortion. 



Figure A-6: Distorted image 
view 1. 





Figure A-7: Distorted image Figure A-8: Distorted image 
view 2. view 3. 
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